<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI Solutions Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/tag/ai-solutions/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.ovhcloud.com/tag/ai-solutions/</link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Mon, 03 Jul 2023 08:09:12 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>AI Solutions Archives - OVHcloud Blog</title>
	<link>https://blog.ovhcloud.com/tag/ai-solutions/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Deploy a custom Docker image for Data Science project – A spam classifier with FastAPI (Part 3)</title>
		<link>https://blog.ovhcloud.com/deploy-a-custom-docker-image-for-data-science-project-a-spam-classifier-with-fastapi-part-3/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Fri, 30 Dec 2022 10:39:54 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Notebook]]></category>
		<category><![CDATA[AI Solutions]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Scikit Learn]]></category>
		<category><![CDATA[spam classification]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=24202</guid>

					<description><![CDATA[A guide to deploy a custom Docker image for an API with FastAPI and AI Deploy. Welcome to the third article concerning custom Docker image deployment. If you haven&#8217;t read the previous ones, you can check it: &#8211; Gradio sketch recognition app&#8211; Streamlit app for EDA and interactive prediction When creating code for a Data [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeploy-a-custom-docker-image-for-data-science-project-a-spam-classifier-with-fastapi-part-3%2F&amp;action_name=Deploy%20a%20custom%20Docker%20image%20for%20Data%20Science%20project%20%E2%80%93%20A%20spam%20classifier%20with%20FastAPI%20%28Part%203%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>A guide to deploy a custom Docker image for an API with <a href="https://fastapi.tiangolo.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">FastAPI</a> and <strong>AI Deploy</strong>.</em></p>



<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1024" height="815" src="https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-spam-classifier-1024x815.jpg" alt="fastapi for spam classification" class="wp-image-24226" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-spam-classifier-1024x815.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-spam-classifier-300x239.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-spam-classifier-768x612.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-spam-classifier-1536x1223.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-spam-classifier.jpg 1620w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p><em>Welcome to the third article concerning <strong>custom Docker image deployment</strong>. If you haven&#8217;t read the previous ones, you can check it:</em></p>



<p><em>&#8211; </em><a href="https://blog.ovhcloud.com/deploy-a-custom-docker-image-for-data-science-project-gradio-sketch-recognition-app-part-1/" data-wpel-link="internal">Gradio sketch recognition app</a><br><em>&#8211; </em><a href="https://docs.ovh.com/fr/publiccloud/ai/deploy/tuto-streamlit-eda-iris/" data-wpel-link="exclude">Streamlit app for EDA and interactive prediction</a></p>



<p>When creating code for a <strong>Data Science project</strong>, you probably want it to be as portable as possible. In other words, it can be run as many times as you like, even on different machines.</p>



<p>Unfortunately, it is often the case that a Data Science code works fine locally on a machine but gives errors during runtime. It can be due to different versions of libraries installed on the host machine.</p>



<p>To deal with this problem, you can use <a href="https://www.docker.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Docker</a>.</p>



<p><strong>The article is organized as follows:</strong></p>



<ul class="wp-block-list">
<li>Objectives</li>



<li>Concepts</li>



<li>Define a model for spam classification</li>



<li>Build the FastAPI app with Python</li>



<li>Containerize your app with Docker</li>



<li>Launch the app with AI Deploy</li>
</ul>



<p><em>All the code for this blogpost is available in our dedicated <a href="https://github.com/ovh/ai-training-examples/tree/main/apps/fastapi/spam-classifier-api" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>. You can test it with OVHcloud <strong>AI Deploy</strong> tool, please refer to the <a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/tuto-fastapi-spam-classifier/" data-wpel-link="exclude">documentation</a> to boot it up.</em></p>



<h2 class="wp-block-heading">Objectives</h2>



<p>In this article, you will learn how to develop <strong>FastAPI</strong> API for spam classification.</p>



<p>Once your app is up and running locally, it will be a matter of containerizing it, then deploying the custom Docker image with AI Deploy.</p>



<figure class="wp-block-image size-large"><img decoding="async" width="2160" height="1215" src="https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-objective-edited.jpg" alt="objective of api deployment" class="wp-image-24228" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-objective-edited.jpg 2160w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-objective-edited-300x169.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-objective-edited-1024x576.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-objective-edited-768x432.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-objective-edited-1536x864.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-objective-edited-2048x1152.jpg 2048w" sizes="(max-width: 2160px) 100vw, 2160px" /></figure>



<h2 class="wp-block-heading">Concepts</h2>



<p>In Artificial Intelligence, you have probably heard of <strong>Natural Language Processing</strong> (NLP). <strong>NLP</strong> gathers several tasks related to language processing such as <strong>text classification</strong>.</p>



<p>This technique is ideal for distinguishing spam from other messages.</p>



<h3 class="wp-block-heading">Spam Ham Collection&nbsp;Dataset</h3>



<p>The <a href="https://archive.ics.uci.edu/ml/datasets/sms+spam+collection" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">SMS Spam Collection</a> is a public set of SMS labeled messages that have been collected for mobile phone spam research.</p>



<p>The dataset contains <strong>5,574 messages</strong> in English. The SMS are tagged as follow:</p>



<ul class="wp-block-list">
<li><strong>HAM</strong> if the message is legitimate</li>



<li><strong>SPAM</strong> if it is not</li>
</ul>



<p>The collection is a <strong>text file</strong>, where each line has the correct <strong>class</strong> followed by the raw <strong>message</strong>.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-5-1024x576.png" alt="spam ham dataset" class="wp-image-24219" width="773" height="435" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-5-1024x576.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-5-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-5-768x432.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-5-1536x864.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-5.png 1920w" sizes="(max-width: 773px) 100vw, 773px" /></figure>



<h3 class="wp-block-heading">Logistic regression</h3>



<p><strong>What is a Logistic Regression?</strong></p>



<p><a href="https://fr.wikipedia.org/wiki/R%C3%A9gression_logistique" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Logistic regression</a> is a statistical model. It allows to study the relationships between a set of <code>i</code> <strong>qualitative variables</strong> (<code>Xi</code>) and a <strong>qualitative variable</strong> (<code>Y</code>).</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-logistic-regression-1024x779.jpg" alt="logistic regression" class="wp-image-24229" width="467" height="355" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-logistic-regression-1024x779.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-logistic-regression-300x228.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-logistic-regression-768x584.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-logistic-regression-1536x1168.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-logistic-regression.jpg 1620w" sizes="auto, (max-width: 467px) 100vw, 467px" /></figure>



<p>It is a generalized linear model using a logistic function as a link function.</p>



<p>A logistic regression model can also predict the <strong>probability</strong> of an event occurring (value close to <code><strong>1</strong></code>) or not (value close to <strong><code>0</code></strong>) from the optimization of the <strong>regression coefficients</strong>. This result always varies between <strong><code>0</code></strong> and <strong><code>1</code></strong>.</p>



<p>For the spam classification use case, <strong>words</strong> are inputs and <strong>class</strong> (spam or ham) is output.</p>



<h3 class="wp-block-heading">FastAPI</h3>



<p><strong>What is FastAPI?</strong></p>



<p><a href="https://fastapi.tiangolo.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">FastAPI</a> is a web framework for building <strong>RESTful APIs</strong> with Python.</p>



<p>FastAPI is based on <a href="https://docs.pydantic.dev/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Pydantic</a> and type guidance to <em>validate</em>, <em>serialize</em> and <em>deserialize</em> data, and automatically generate OpenAPI documents.</p>



<h3 class="wp-block-heading">Docker</h3>



<p><a href="https://www.docker.com/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Docker</a>&nbsp;platform allows you to build, run and manage isolated applications. The principle is to build an application that contains not only the written code but also all the context to run the code: libraries and their versions for example</p>



<p>When you wrap your application with all its context, you build a Docker image, which can be saved in your local repository or in the Docker Hub.</p>



<p>To get started with Docker, please, check this&nbsp;<a href="https://www.docker.com/get-started" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">documentation</a>.</p>



<p>To build a Docker image, you will define 2 elements:</p>



<ul class="wp-block-list">
<li>the application code (<em>FastAPI app</em>)</li>



<li>the&nbsp;<a href="https://docs.docker.com/engine/reference/builder/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Dockerfile</a></li>
</ul>



<p>In the next steps, you will see how to develop the Python code for your app, but also how to write the Dockerfile.</p>



<p>Finally, you will see how to deploy your custom docker image with&nbsp;<strong>OVHcloud AI Deploy</strong>&nbsp;tool.</p>



<h3 class="wp-block-heading">AI Deploy</h3>



<p><strong>AI Deploy</strong>&nbsp;enables AI models and managed applications to be started via Docker containers.</p>



<p>To know more about AI Deploy, please refer to this&nbsp;<a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/getting-started/" data-wpel-link="exclude">documentation</a>.</p>



<h2 class="wp-block-heading">Define a model for spam classification</h2>



<p>❗ <strong><code>To develop an API that uses a Machine Learning model, you have to load the model in the correct format. For this tutorial, a Logistic Regression is used and the Python file model.py is used to define it</code></strong>.<br><br><code><strong>To better understand the model.py code, refer to the <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/natural-language-processing/text-classification/miniconda/spam-classifier/notebook-spam-classifier.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">notebook</a> which details all the steps</strong></code>.</p>



<p>First of all, you have to import the&nbsp;<strong>Python libraries</strong>&nbsp;needed to create the Logistic Regression in the <code>model.py</code> file.</p>



<pre class="wp-block-code"><code class="">import pandas as pd
import numpy as np
from sklearn import model_selection
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression</code></pre>



<p>Now, you can create the Logistic Regression based on the <strong>Spam Ham Collection&nbsp;Dataset</strong>. The Python framework named <strong>Scikit-Learn</strong> is used to define this model.</p>



<p>Firstly, you can load the dataset and transform your input file into a <code>dataframe</code>.</p>



<p>You will also be able to define the <code>input</code> and the <code>output</code> of the model.</p>



<pre class="wp-block-code"><code class="">def load_data():

    PATH = 'SMSSpamCollection'
    df = pd.read_csv(PATH, delimiter = "\t", names=["classe", "message"])

    X = df['message']
    y = df['classe']

    return X, y</code></pre>



<p>In a second step, you split the data in a training and a test set.</p>



<p>To <strong>separate the dataset fairly</strong> and to have a <code>test_size</code> between 0 and 1, you can calculate <code>ntest</code> as follows.</p>



<pre class="wp-block-code"><code class="">def split_data(X, y):

    ntest = 2000/(3572+2000)

    X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=ntest, random_state=0)

    return X_train, y_train</code></pre>



<p>Now you can concentrate on creating the <strong>Machine Learning model</strong>. To do this, create a <code>spam_classifier_model</code> function.</p>



<p>To fully understand the code, refer to <strong>Steps 6 to 9</strong> of this <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/natural-language-processing/text-classification/miniconda/spam-classifier/notebook-spam-classifier.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">notebook</a>. In these steps you will learn how to:</p>



<ul class="wp-block-list">
<li>create the model using <strong>Logistic Regression</strong></li>



<li>evaluate on the test set</li>



<li>do <strong>dimension reduction</strong> with stop words and term frequency</li>



<li>do <strong>dimension reduction</strong> to post-processing of the model</li>
</ul>



<pre class="wp-block-code"><code class="">def spam_classifier_model(Xtrain, ytrain):

    model_logistic_regression = LogisticRegression()
    model_logistic_regression = model_logistic_regression.fit(Xtrain, ytrain)

    coeff = model_logistic_regression.coef_
    coef_abs = np.abs(coeff)

    quantiles = np.quantile(coef_abs,[0, 0.25, 0.5, 0.75, 0.9, 1])

    index = np.where(coeff[0] &gt; quantiles[1])
    newXtrain = Xtrain[:, index[0]]

    model_logistic_regression = LogisticRegression()
    model_logistic_regression.fit(newXtrain, ytrain)

    return model_logistic_regression, index</code></pre>



<p>Once these Python functions are defined, you can call and apply them as follows.</p>



<p>Firstly, extract input and output data with <code>load_data()</code>:</p>



<pre class="wp-block-code"><code class="">data_input, data_output = load_data()</code></pre>



<p>Secondly, split the data using the <code>split_data(data_input, data_output)</code>:</p>



<pre class="wp-block-code"><code class="">X_train, ytrain = split_data(data_input, data_output)</code></pre>



<p>❗ <code><strong>Here, there is no need to use the test set. Indeed, the evaluation of the final model has already been done in <em>Step 9 - Dimensionality reduction: post processing of the model</em> of the <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/natural-language-processing/text-classification/miniconda/spam-classifier/notebook-spam-classifier.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">notebook</a>.</strong></code></p>



<p>Thirdly, <strong>transform</strong> and <strong>fit</strong> training set. In order to<strong> </strong>prepare<strong> </strong>the data, you can use <code>CountVectorizer</code> from Scikit-Learn to remove <strong>stop-words</strong> and then <code>fit_transform</code> to fit the inputs.</p>



<pre class="wp-block-code"><code class="">vectorizer = CountVectorizer(stop_words='english', binary=True, min_df=10)
Xtrain = vectorizer.fit_transform(X_train.tolist())
Xtrain = Xtrain.toarray()</code></pre>



<p>Fourthly, use the model and index for prediction by calling <code>spam_classifier_model</code> function.</p>



<pre class="wp-block-code"><code class="">model_logistic_regression, index = spam_classifier_model(Xtrain, ytrain)</code></pre>



<p>Find out the full Python code <a href="https://github.com/ovh/ai-training-examples/blob/main/apps/fastapi/spam-classifier-api/model.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">here</a>.</p>



<p>Have you successfully defined your model? Good job 🥳 !</p>



<p>Let&#8217;s go for the creation of the API!</p>



<h2 class="wp-block-heading">Build the FastAPI app with Python</h2>



<p>❗ <code><strong>All the codes below are available in the <em>app.py</em> file. You can find the complete Python code of the <em>app.py</em> file <a href="https://github.com/ovh/ai-training-examples/blob/main/apps/fastapi/spam-classifier-api/app.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">here</a>.</strong></code></p>



<p>To begin, you can import dependencies for FastAPI app.</p>



<ul class="wp-block-list">
<li>uvicorn</li>



<li>fastapi</li>



<li>pydantic</li>
</ul>



<pre class="wp-block-code"><code class="">import uvicorn
from fastapi import FastAPI
from pydantic import BaseModel
from model import model_logistic_regression, index, vectorizer</code></pre>



<p>In the first place, you can initialize an instance of FastAPI.</p>



<pre class="wp-block-code"><code class="">app = FastAPI()</code></pre>



<p>Next, you can define the data format by creating the Python class named <code>request_body</code>. Here, the <strong>string</strong> (<code>str</code>) format is required.</p>



<pre class="wp-block-code"><code class="">class request_body(BaseModel):
    message : str</code></pre>



<p>Now, you can create the process function in order to prepare the sent message to be used by the model.</p>



<pre class="wp-block-code"><code class="">def process_message(message):

    desc = vectorizer.transform(message)
    dense_desc = desc.toarray()
    dense_select = dense_desc[:, index[0]]

    return dense_select</code></pre>



<p>At the exit of this function the message does not contain any more <strong>stop words</strong>, it is put in the right format for the model thanks to the <code>transform</code> and is then represented as an <code>array</code>.</p>



<p>Now that the function for processing the input data is defined, you can pass the <code>GET</code> and <code>POST</code> methods.</p>



<p>First, let&#8217;s go for the <code>GET</code> method!</p>



<pre class="wp-block-code"><code class="">@app.get('/')
def root():
    return {'message': 'Welcome to the SPAM classifier API'}</code></pre>



<p>Here you can see the <em>welcome message</em> when you arrive on your API.</p>



<pre class="wp-block-preformatted"><code><strong>{"message":"Welcome to the SPAM classifier API"}</strong></code></pre>



<p>Now it&#8217;s the turn of the <code>POST</code> method. In this part of the code, you will be able to:</p>



<ul class="wp-block-list">
<li>define the message format</li>



<li>check if a message has been sent or not</li>



<li>process the message to fit with the model</li>



<li>extract the probabilities</li>



<li>return the results</li>
</ul>



<pre class="wp-block-code"><code class="">@app.post('/spam_detection_path')
def classify_message(data : request_body):

    message = [
        data.message
    ]

    if (not (message)):
        raise HTTPException(status_code=400, detail="Please Provide a valid text message")

    dense_select = process_message(message)

    label = model_logistic_regression.predict(dense_select)
    proba = model_logistic_regression.predict_proba(dense_select)

    if label[0]=='ham':
        label_proba = proba[0][0]
    else:
        label_proba = proba[0][1]

    return {'label': label[0], 'label_probability': label_proba}</code></pre>



<p><code><strong>❗ Again, you can find the full code <a href="https://github.com/ovh/ai-training-examples/blob/main/apps/fastapi/spam-classifier-api/app.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">here</a></strong></code>.</p>



<p>Before deploying your API, you can test it locally using the following command:</p>



<pre class="wp-block-code"><code class="">uvicorn app:app --reload</code></pre>



<p>Then, you can test your app locally at the following address:&nbsp;<strong><code>http://localhost:8000/</code></strong></p>



<p>You will arrive on the following page:</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-4.png" alt="" class="wp-image-24217" width="590" height="721" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-4.png 760w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-4-245x300.png 245w" sizes="auto, (max-width: 590px) 100vw, 590px" /></figure>



<p><strong>How to interact with your&nbsp;API?</strong></p>



<p>You can add&nbsp;<code>/docs</code>&nbsp;at the end of the url of your&nbsp;app: <strong><code>http://localhost:8000/</code></strong><code><strong>docs</strong></code></p>



<p>A new page opens to you. It provides a complete dashboard for interacting with the&nbsp;API!</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image.png" alt="" class="wp-image-24213" width="590" height="722" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image.png 760w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-245x300.png 245w" sizes="auto, (max-width: 590px) 100vw, 590px" /></figure>



<p>To be able to send a message for classification, select&nbsp;<code><strong>/spam_detection_path</strong></code>&nbsp;in the green box. Click on<strong>&nbsp;<code>Try</code></strong><code><strong> it out</strong></code>&nbsp;and type the message of your choice in the dedicated&nbsp;zone.</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-2.png" alt="" class="wp-image-24215" width="596" height="729" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-2.png 760w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-2-245x300.png 245w" sizes="auto, (max-width: 596px) 100vw, 596px" /></figure>



<p>Enter the message of your choice. It must be in the form of a <code><strong>string</strong></code>. </p>



<p><em>Example:</em> <code><strong>"A new free service for you only"</strong></code></p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-1.png" alt="" class="wp-image-24214" width="599" height="733" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-1.png 760w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-1-245x300.png 245w" sizes="auto, (max-width: 599px) 100vw, 599px" /></figure>



<p>To get the result of the prediction, click on the&nbsp;<code><strong>Execute</strong></code>&nbsp;button.</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-3.png" alt="" class="wp-image-24216" width="611" height="748" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-3.png 760w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/image-3-245x300.png 245w" sizes="auto, (max-width: 611px) 100vw, 611px" /></figure>



<p>Finally, you obtain the result of the prediction with the&nbsp;<strong>label</strong>&nbsp;and the&nbsp;<strong>confidence&nbsp;score</strong>.</p>



<p>Your app works locally? Congratulations&nbsp;🎉 !</p>



<p>Now it’s time to move on to containerization!</p>



<h2 class="wp-block-heading">Containerize your app with Docker</h2>



<p>First of all, you have to build the file that will contain the different Python modules to be installed with their corresponding version.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="574" src="https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-docker-1024x574.jpg" alt="docker image datascience" class="wp-image-24230" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-docker-1024x574.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-docker-300x168.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-docker-768x430.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-docker-1536x861.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/12/draw-docker.jpg 1620w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading">Create the requirements.txt file</h3>



<p>The&nbsp;<code><a href="https://github.com/ovh/ai-training-examples/blob/main/apps/fastapi/spam-classifier-api/requirements.txt" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">requirements.txt</a></code>&nbsp;file will allow us to write all the modules needed to make our application work.</p>



<pre class="wp-block-code"><code class="">fastapi==0.87.0
pydantic==1.10.2
uvicorn==0.20.0
pandas==1.5.1
scikit-learn==1.1.3</code></pre>



<p>This file will be useful when writing the&nbsp;<code>Dockerfile</code>.</p>



<h3 class="wp-block-heading">Write the Dockerfile</h3>



<p>Your&nbsp;<code><a href="https://github.com/ovh/ai-training-examples/blob/main/apps/fastapi/spam-classifier-api/Dockerfile" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Dockerfile</a></code>&nbsp;should start with the the&nbsp;<code>FROM</code>&nbsp;instruction indicating the parent image to use. In our case we choose to start from a classic Python image.</p>



<p>For this Streamlit app, you can use version&nbsp;<strong><code>3.8</code></strong>&nbsp;of Python.</p>



<pre class="wp-block-code"><code class="">FROM python:3.8</code></pre>



<p>Next, you have to to fill in the working directory and add all&nbsp;files into.</p>



<p><code><strong>❗&nbsp;Here you must be in the /workspace directory. This is the basic directory for launching an OVHcloud AI Deploy.</strong></code></p>



<pre class="wp-block-code"><code class="">WORKDIR /workspace
ADD . /workspace</code></pre>



<p>Install the&nbsp;<code>requirements.txt</code>&nbsp;file which contains your needed Python modules using a&nbsp;<code>pip install…</code>&nbsp;command.</p>



<pre class="wp-block-code"><code class="">RUN pip install -r requirements.txt</code></pre>



<p>Set the listening port of the&nbsp;container. For <strong>FastAPI</strong>, you can use the port <code>8000</code>.</p>



<pre class="wp-block-code"><code class="">EXPOSE 8000</code></pre>



<p>Then, you have to define the <strong>entrypoint</strong> and the <strong>default launching command</strong> to start the application.</p>



<pre class="wp-block-code"><code class="">ENTRYPOINT ["uvicorn"]
CMD [ "streamlit", "run", "/workspace/app.py", "--server.address=0.0.0.0" ]</code></pre>



<p>Finally, you can give correct access rights to OVHcloud user (<code>42420:42420</code>).</p>



<pre class="wp-block-code"><code class="">RUN chown -R 42420:42420 /workspace
ENV HOME=/workspace</code></pre>



<p>Once your&nbsp;<code>Dockerfile</code>&nbsp;is defined, you will be able to build your custom docker image.</p>



<h3 class="wp-block-heading">Build the Docker image from the Dockerfile</h3>



<p>First, you can launch the following command from the&nbsp;<code>Dockerfile</code>&nbsp;directory to build your application image.</p>



<pre class="wp-block-code"><code class="">docker build . -t fastapi-spam-classification:latest</code></pre>



<p>⚠️&nbsp;<strong><code>The dot . argument indicates that your build context (place of the Dockerfile and other needed files) is the current directory.</code></strong></p>



<p>⚠️&nbsp;<code><strong>The -t argument allows you to choose the identifier to give to your image. Usually image identifiers are composed of a name and a version tag &lt;name&gt;:&lt;version&gt;. For this example we chose fastapi-spam-classification:latest.</strong></code></p>



<h3 class="wp-block-heading">Test it locally</h3>



<p>Now, you can run the following&nbsp;<strong>Docker command</strong>&nbsp;to launch your application locally on your computer.</p>



<pre class="wp-block-code"><code class="">docker run --rm -it -p 8080:8080 --user=42420:42420 fastapi-spam-classification<span style="background-color: inherit;font-family: inherit;font-size: 1rem;font-weight: inherit">:latest</span></code></pre>



<p>⚠️&nbsp;<code><strong>The -p 8000:8000 argument indicates that you want to execute a port redirection from the port 8000 of your local machine into the port 8000 of the Docker container.</strong></code></p>



<p>⚠️<code><strong>&nbsp;Don't forget the --user=42420:42420 argument if you want to simulate the exact same behaviour that will occur on AI Deploy. It executes the Docker container as the specific OVHcloud user (user 42420:42420).</strong></code></p>



<p>Once started, your application should be available on&nbsp;<strong>http://localhost:8000</strong>.<br><br>Your Docker image seems to work? Good job&nbsp;👍 !<br><br>It’s time to push it and deploy it!</p>



<h3 class="wp-block-heading">Push the image into the shared registry</h3>



<p>❗&nbsp;The shared registry of AI Deploy should only be used for testing purpose. Please consider attaching your own Docker registry. More information about this can be found&nbsp;<a href="https://docs.ovh.com/asia/en/publiccloud/ai/training/add-private-registry/" data-wpel-link="exclude">here</a>.</p>



<p>Then, you have to find the address of your&nbsp;<code>shared registry</code>&nbsp;by launching this command.</p>



<pre class="wp-block-code"><code class="">ovhai registry list</code></pre>



<p>Next, log in on the shared registry with your usual&nbsp;<code>OpenStack</code>&nbsp;credentials.</p>



<pre class="wp-block-code"><code class="">docker login -u &lt;user&gt; -p &lt;password&gt; &lt;shared-registry-address&gt;</code></pre>



<p>To finish, you need to push the created image into the shared registry.</p>



<pre class="wp-block-code"><code class="">docker tag fastapi-spam-classification:latest &lt;shared-registry-address&gt;/fastapi-spam-classification:latest</code></pre>



<pre class="wp-block-code"><code class="">docker push &lt;shared-registry-address&gt;/fastapi-spam-classification:latest</code></pre>



<p>Once you have pushed your custom Docker image into the shared registry, you are ready to launch your app 🚀 !</p>



<h2 class="wp-block-heading">Launch the AI Deploy app</h2>



<p>The following command starts a new job running your <strong>FastAPI</strong> application.</p>



<pre class="wp-block-code"><code class="">ovhai app run \
      --default-http-port 8000 \
      --cpu 4 \
      &lt;shared-registry-address&gt;/fastapi-spam-classification:latest</code></pre>



<h3 class="wp-block-heading">Choose the compute resources</h3>



<p>First, you can either choose the number of GPUs or CPUs for your app.</p>



<p><code><strong>--cpu 4</strong></code>&nbsp;indicates that we request 4 CPUs for that app.</p>



<h3 class="wp-block-heading">Make the app public</h3>



<p>Finally, if you want your app to be accessible without the need to authenticate, specify it as follows.</p>



<p>Consider adding the&nbsp;<code><strong>--unsecure-http</strong></code>&nbsp;attribute if you want your application to be reachable without any authentication.</p>



<figure class="wp-block-video"></figure>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Well done 🎉&nbsp;! You have learned how to build your&nbsp;<strong>own Docker image</strong>&nbsp;for a dedicated&nbsp;<strong>spam classification API</strong>!</p>



<p>You have also been able to deploy this app thanks to&nbsp;<strong>OVHcloud’s AI Deploy</strong>&nbsp;tool.</p>



<h3 class="wp-block-heading" id="want-to-find-out-more">Want to find out more?</h3>



<h5 class="wp-block-heading"><strong>Notebook</strong></h5>



<p>You want to access the notebook? Refer to the&nbsp;<a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/natural-language-processing/text-classification/miniconda/spam-classifier/notebook-spam-classifier.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.</p>



<h5 class="wp-block-heading"><strong>App</strong></h5>



<p>You want to access to the full code to create the <strong>FastAPI</strong> API? Refer to the&nbsp;<a href="https://github.com/ovh/ai-training-examples/tree/main/apps/fastapi/spam-classifier-api" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a>.<br><br>To launch and test this app with&nbsp;<strong>AI Deploy</strong>, please refer to&nbsp;our&nbsp;<a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/tuto-fastapi-spam-classifier/" data-wpel-link="exclude">documentation</a>.</p>



<h2 class="wp-block-heading">References</h2>



<ul class="wp-block-list">
<li><a href="https://towardsdatascience.com/how-to-run-a-data-science-project-in-a-docker-container-2ab1a3baa889" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">How to Run a Data Science Project in a Docker Container</a></li>



<li><a href="https://towardsdatascience.com/step-by-step-approach-to-build-your-machine-learning-api-using-fast-api-21bd32f2bbdb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Step-by-step Approach to Build Your Machine Learning API Using Fast API</a></li>
</ul>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeploy-a-custom-docker-image-for-data-science-project-a-spam-classifier-with-fastapi-part-3%2F&amp;action_name=Deploy%20a%20custom%20Docker%20image%20for%20Data%20Science%20project%20%E2%80%93%20A%20spam%20classifier%20with%20FastAPI%20%28Part%203%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to build a Speech-To-Text Application with Python (3/3)</title>
		<link>https://blog.ovhcloud.com/how-to-build-a-speech-to-text-application-with-python-3-3/</link>
		
		<dc:creator><![CDATA[Mathieu Busquet]]></dc:creator>
		<pubDate>Mon, 26 Dec 2022 14:22:42 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Apps]]></category>
		<category><![CDATA[AI Solutions]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[PyTorch]]></category>
		<category><![CDATA[Streamlit]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=23823</guid>

					<description><![CDATA[A tutorial to create and build your own Speech-To-Text Application with Python. At the end of this third article, your Speech-To-Text Application will offer many new features such as speaker differentiation, summarization, video subtitles generation, audio trimming, and others! Final code of the app is available in our dedicated GitHub repository. Overview of our final [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-to-build-a-speech-to-text-application-with-python-3-3%2F&amp;action_name=How%20to%20build%20a%20Speech-To-Text%20Application%20with%20Python%20%283%2F3%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p id="block-972dc647-4202-432e-86f0-434b7dd789f0"><em>A tutorial to create and build your own <strong>Speech-To-Text Application</strong></em> with Python.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="576" src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-3-1024x576.png" alt="speech to text app image3" class="wp-image-24060" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-3-1024x576.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-3-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-3-768x432.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-3-1536x864.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-3.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p id="block-799b7c8e-3686-469c-b530-a568cfcce605">At the end of this <strong>third article</strong>, your Speech-To-Text Application will offer <strong>many new features</strong> such as speaker differentiation, summarization, video subtitles generation, audio trimming, and others!</p>



<p id="block-b8ed3876-3e4c-42cb-b83e-4e4f3c9b13b3"><em>Final code of the app is available in our dedicated <a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/speech-to-text" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a>.</em></p>



<h3 class="wp-block-heading" id="block-da716ed2-9734-494e-be61-539249c19438">Overview of our final Speech to Text Application</h3>



<figure class="wp-block-image aligncenter" id="block-2d8b0805-62db-4814-b015-efba81a8520a"><img decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview-1024x575.png" alt="Overview of the speech to text application"/></figure>



<p class="has-text-align-center" id="block-865fd283-6e3d-47f5-96cb-fdf47b60ffd2"><em>Overview of our final Speech-To-Text application</em></p>



<h3 class="wp-block-heading" id="block-1df7dcfc-8052-426f-ac67-ebc251d14185">Objective</h3>



<p id="block-812edfb1-6bba-47c0-98f0-830353e3d5c6">In the <a href="https://blog.ovhcloud.com/how-to-build-a-speech-to-text-application-with-python-2-3/" target="_blank" rel="noreferrer noopener" data-wpel-link="internal">previous article</a>, we have created a form where the user can select the options he wants to interact with. </p>



<p id="block-812edfb1-6bba-47c0-98f0-830353e3d5c6">Now that this form is created, it&#8217;s time to <strong>deploy the features</strong>!</p>



<p id="block-2fa073cc-8250-4740-a707-1a1b67dff94d">This article is organized as follows:</p>



<ul class="wp-block-list">
<li>Trim an audio</li>



<li>Puntuate the transcript</li>



<li>Differentiate speakers with diarization</li>



<li>Display the transcript correctly</li>



<li>Rename speakers</li>



<li>Create subtitles for videos (<em>.SRT</em>)</li>



<li>Update old code</li>
</ul>



<p id="block-d4e59d83-200a-4e2b-b502-d5fe00941adb"><em>⚠️ Since this article uses code already explained in the previous <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/natural-language-processing/speech-to-text/miniconda" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">notebook tutorials</a>, w</em>e will not <em>re-explained its usefulness</em> here<em>. We therefore recommend that you read the notebooks first.</em></p>



<h3 class="wp-block-heading" id="block-1d231b87-2963-4dba-9df6-2f7dc4acb93c">Trim an audio ✂️</h3>



<p>The first option we are going to add add is to be able to trim an audio file. </p>



<p>Indeed, if the user&#8217;s audio file is <strong>several minutes long</strong>, it is possible that the user only wants to <strong>transcribe a part of it</strong> to save some time. This is where the <strong>sliders</strong> of our form become useful. They allow the user to <strong>change default start &amp; end values</strong>, which determine which part of the audio file is transcribed.</p>



<p><em>For example, if the user&#8217;s file is 10 minutes long, the user can use the sliders to indicate that he only wants to transcribe the [00:30 -&gt; 02:30] part, instead of the full audio file.</em></p>



<p id="block-837f49db-342e-4f92-8f7d-ae9b97ee51c2">⚠️ With this functionality, we must <strong>check the values </strong>set by the user! Indeed, imagine that the user selects an <em>end</em> value which is lower than the <em>start</em> one (ex : transcript would starts from start=40s to end=20s), this would be problematic.</p>



<p id="block-6bcd6e87-53cf-40d4-8b94-9f97c988587a">This is why you need to <strong>add the following function</strong> to your code, to rectify the potential errors:</p>



<pre id="block-00415e75-ed9c-4257-a5d2-0d033f419082" class="wp-block-code"><code class="">def correct_values(start, end, audio_length):
    """
    Start or/and end value(s) can be in conflict, so we check these values
    :param start: int value (s) given by st.slider() (fixed by user)
    :param end: int value (s) given by st.slider() (fixed by user)
    :param audio_length: audio duration (s)
    :return: approved values
    """
    # Start &amp; end Values need to be checked

    if start &gt;= audio_length or start &gt;= end:
        start = 0
        st.write("Start value has been set to 0s because of conflicts with other values")

    if end &gt; audio_length or end == 0:
        end = audio_length
        st.write("End value has been set to maximum value because of conflicts with other values")

    return start, end</code></pre>



<p id="block-a1ab0c76-ddb2-4abd-99ef-15f001529dae">If one of the values has been changed, we immediately <strong>inform the user</strong> with a <em>st.write().</em></p>



<p>We will call this function in the <em>transcription()</em> function, that we will rewrite at the end of this tutorial.</p>



<h3 class="wp-block-heading" id="block-50095355-4c37-4c0a-9ade-0ca44e86fde9">Split a text</h3>



<p id="block-b2616dcc-f867-4200-b95d-46b386d7efe1">If you have read the notebooks, you probably remember that some models (punctuation &amp; summarization) have <strong>input size limitations</strong>.</p>



<p id="block-62890aae-06a4-43e0-a8eb-2a13f2d57fe6">Let&#8217;s <strong>reuse the <em>split_text()</em> function</strong>, used in the notebooks, which will allow to send our whole transcript to these models by small text blocks, limited to a <em>max_size</em> number of characters:</p>



<pre id="block-0669f22d-2b45-4053-a806-07c9e897fc8e" class="wp-block-code"><code class="">def split_text(my_text, max_size):
    """
    Split a text
    Maximum sequence length for this model is max_size.
    If the transcript is longer, it needs to be split by the nearest possible value to max_size.
    To avoid cutting words, we will cut on "." characters, and " " if there is not "."

    :return: split text
    """

    cut2 = max_size

    # First, we get indexes of "."
    my_split_text_list = []
    nearest_index = 0
    length = len(my_text)
    # We split the transcript in text blocks of size &lt;= max_size.
    if cut2 == length:
        my_split_text_list.append(my_text)
    else:
        while cut2 &lt;= length:
            cut1 = nearest_index
            cut2 = nearest_index + max_size
            # Find the best index to split

            dots_indexes = [index for index, char in enumerate(my_text[cut1:cut2]) if
                            char == "."]
            if dots_indexes != []:
                nearest_index = max(dots_indexes) + 1 + cut1
            else:
                spaces_indexes = [index for index, char in enumerate(my_text[cut1:cut2]) if
                                  char == " "]
                if spaces_indexes != []:
                    nearest_index = max(spaces_indexes) + 1 + cut1
                else:
                    nearest_index = cut2 + cut1
            my_split_text_list.append(my_text[cut1: nearest_index])

    return my_split_text_list</code></pre>



<h3 class="wp-block-heading" id="block-aeff41dd-fe6b-4b31-ad7a-e2916e00657d">Punctuate the transcript</h3>



<p>Now, we need to add the function that allows us to <strong>send a <em>transcript</em> to the punctuation model</strong> in order to punctuate it:</p>



<pre id="block-6316f13a-3018-4a13-8bd6-b35a593c82e0" class="wp-block-code"><code class="">def add_punctuation(t5_model, t5_tokenizer, transcript):
    """
    Punctuate a transcript
    transcript: string limited to 512 characters
    :return: Punctuated and improved (corrected) transcript
    """

    input_text = "fix: { " + transcript + " } &lt;/s&gt;"

    input_ids = t5_tokenizer.encode(input_text, return_tensors="pt", max_length=10000, truncation=True,
                                    add_special_tokens=True)

    outputs = t5_model.generate(
        input_ids=input_ids,
        max_length=256,
        num_beams=4,
        repetition_penalty=1.0,
        length_penalty=1.0,
        early_stopping=True
    )

    transcript = t5_tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)

    return transcript</code></pre>



<p>The punctuation feature is now ready. We will call these functions later.<br>For the summarization model, you don&#8217;t have to do anything else either. </p>



<h3 class="wp-block-heading" id="block-ed0fa2ed-80f4-4821-9931-68f055e12490">Differentiate speakers with diarization</h3>



<p id="block-7430afcf-a8eb-45a7-bb0c-ebf0e1c77767">Now, let&#8217;s reuse all the <strong>diarization functions</strong> studied in the notebook tutorials, so we can differentiate speakers during a conversation.</p>



<p id="block-42bf8e95-825c-4e63-bfe9-877000b5f8f2"><strong>Convert <em>mp3/mp4</em> files to </strong><em><strong>.wav</strong> </em></p>



<p id="block-42bf8e95-825c-4e63-bfe9-877000b5f8f2"><em>Remember pyannote&#8217;s diarization only accepts .wav files as input.</em></p>



<pre id="block-71aa8771-0c17-40c7-88f1-a2880f99755b" class="wp-block-code"><code class="">def convert_file_to_wav(aud_seg, filename):
    """
    Convert a mp3/mp4 in a wav format
    Needs to be modified if you want to convert a format which contains less or more than 3 letters

    :param aud_seg: pydub.AudioSegment
    :param filename: name of the file
    :return: name of the converted file
    """
    filename = "../data/my_wav_file_" + filename[:-3] + "wav"
    aud_seg.export(filename, format="wav")

    newaudio = AudioSegment.from_file(filename)

    return newaudio, filename</code></pre>



<p id="block-4bdf29a7-df3e-485c-a8a6-f7be0ef8bd58"><strong>Get diarization of an audio file</strong></p>



<p><em>The following function allows you to diarize an andio file</em>.</p>



<pre id="block-85e83a22-7dbb-41de-b4fd-ceea38b5086e" class="wp-block-code"><code class="">def get_diarization(dia_pipeline, filename):
    """
    Diarize an audio (find numbers of speakers, when they speak, ...)
    :param dia_pipeline: Pyannote's library (diarization pipeline)
    :param filename: name of a wav audio file
    :return: str list containing audio's diarization time intervals
    """
    # Get diarization of the audio
    diarization = dia_pipeline({'audio': filename})
    listmapping = diarization.labels()
    listnewmapping = []

    # Rename default speakers' names (Default is A, B, ...), we want Speaker0, Speaker1, ...
    number_of_speakers = len(listmapping)
    for i in range(number_of_speakers):
        listnewmapping.append("Speaker" + str(i))

    mapping_dict = dict(zip(listmapping, listnewmapping))
    diarization.rename_labels(mapping_dict, copy=False)
    # copy set to False so we don't create a new annotation, we replace the actual on

    return diarization, number_of_speakers</code></pre>



<p id="block-15d919bb-e051-452f-a69e-be1d61908437"><strong>Convert diarization results to timedelta objects</strong></p>



<p><em>This conversion makes it easy to manipulate the results. </em></p>



<pre id="block-32987985-e79c-415f-806a-37e8f1721fa3" class="wp-block-code"><code class="">def convert_str_diarlist_to_timedelta(diarization_result):
    """
    Extract from Diarization result the given speakers with their respective speaking times and transform them in pandas timedelta objects
    :param diarization_result: result of diarization
    :return: list with timedelta intervals and their respective speaker
    """

    # get speaking intervals from diarization
    segments = diarization_result.for_json()["content"]
    diarization_timestamps = []
    for sample in segments:
        # Convert segment in a pd.Timedelta object
        new_seg = [pd.Timedelta(seconds=round(sample["segment"]["start"], 2)),
                   pd.Timedelta(seconds=round(sample["segment"]["end"], 2)), sample["label"]]
        # Start and end = speaking duration
        # label = who is speaking
        diarization_timestamps.append(new_seg)

    return diarization_timestamps</code></pre>



<p id="block-3c5ce61f-c4a2-4505-bd1f-d1844eee8949"><strong>Merge the diarization segments</strong> <strong>that follow each other and that mention the same speaker</strong></p>



<p id="block-3c5ce61f-c4a2-4505-bd1f-d1844eee8949"><em>This will reduce the number of audio segments we need to create, and will give less sequenced, less small transcripts, which will be more pleasant for the user.</em></p>



<pre id="block-346d48b0-6c06-4208-b86d-d5424f222e32" class="wp-block-code"><code class="">def merge_speaker_times(diarization_timestamps, max_space, srt_token):
    """
    Merge near times for each detected speaker (Same speaker during 1-2s and 3-4s -&gt; Same speaker during 1-4s)
    :param diarization_timestamps: diarization list
    :param max_space: Maximum temporal distance between two silences
    :param srt_token: Enable/Disable generate srt file (choice fixed by user)
    :return: list with timedelta intervals and their respective speaker
    """

    if not srt_token:
        threshold = pd.Timedelta(seconds=max_space/1000)

        index = 0
        length = len(diarization_timestamps) - 1

        while index &lt; length:
            if diarization_timestamps[index + 1][2] == diarization_timestamps[index][2] and \
                    diarization_timestamps[index + 1][1] - threshold &lt;= diarization_timestamps[index][0]:
                diarization_timestamps[index][1] = diarization_timestamps[index + 1][1]
                del diarization_timestamps[index + 1]
                length -= 1
            else:
                index += 1
    return diarization_timestamps</code></pre>



<p id="block-e36e9996-2809-450a-834b-0482eca3299f"><strong>Extend timestamps given by the diarization to avoid word cutting</strong></p>



<p id="block-b943d9c3-e469-4e9d-b7b7-23d5aa1235e7">Imagine we have a segment like [00:01:20 &#8211;&gt; 00:01:25], followed by [00:01:27 &#8211;&gt; 00:01:30].</p>



<p id="block-c7909bb8-7b9e-471e-a928-75d92375fedc">Maybe diarization is not working fine and there is some sound missing in the segments (means missing sound is between 00:01:25 and 00:01:27). The transcription model will then have difficulty understanding what is being said in these segments.</p>



<p id="block-c7909bb8-7b9e-471e-a928-75d92375fedc">➡️ Solution consists in fixing the end of the first segment and the start of the second one to 00:01:26, the middle of these values.</p>



<pre id="block-df4216a7-0112-425a-b763-9acabdf5d7d9" class="wp-block-code"><code class="">def extending_timestamps(new_diarization_timestamps):
    """
    Extend timestamps between each diarization timestamp if possible, so we avoid word cutting
    :param new_diarization_timestamps: list
    :return: list with merged times
    """

    for i in range(1, len(new_diarization_timestamps)):
        if new_diarization_timestamps[i][0] - new_diarization_timestamps[i - 1][1] &lt;= timedelta(milliseconds=3000) and new_diarization_timestamps[i][0] - new_diarization_timestamps[i - 1][1] &gt;= timedelta(milliseconds=100):
            middle = (new_diarization_timestamps[i][0] - new_diarization_timestamps[i - 1][1]) / 2
            new_diarization_timestamps[i][0] -= middle
            new_diarization_timestamps[i - 1][1] += middle

    # Converting list so we have a milliseconds format
    for elt in new_diarization_timestamps:
        elt[0] = elt[0].total_seconds() * 1000
        elt[1] = elt[1].total_seconds() * 1000

    return new_diarization_timestamps</code></pre>



<p><strong>Create &amp; Optimize the subtitles</strong></p>



<p>Some people tend to speak naturally very quickly. Also, conversations can sometimes be heated. In both of these cases, there is a good chance that the transcribed text is very dense, and not suitable for displaying subtitles (too much text displayed does not allow to see the video anymore). </p>



<p>We will therefore define the following function. Its role will be to split a speech segment in 2, if the length of the text is judged too long.</p>



<pre class="wp-block-code"><code class="">def optimize_subtitles(transcription, srt_index, sub_start, sub_end, srt_text):
    """
    Optimize the subtitles (avoid a too long reading when many words are said in a short time)
    :param transcription: transcript generated for an audio chunk
    :param srt_index: Numeric counter that identifies each sequential subtitle
    :param sub_start: beginning of the transcript
    :param sub_end: end of the transcript
    :param srt_text: generated .srt transcript
    """

    transcription_length = len(transcription)

    # Length of the transcript should be limited to about 42 characters per line to avoid this problem
    if transcription_length &gt; 42:
        # Split the timestamp and its transcript in two parts
        # Get the middle timestamp
        diff = (timedelta(milliseconds=sub_end) - timedelta(milliseconds=sub_start)) / 2
        middle_timestamp = str(timedelta(milliseconds=sub_start) + diff).split(".")[0]

        # Get the closest middle index to a space (we don't divide transcription_length/2 to avoid cutting a word)
        space_indexes = [pos for pos, char in enumerate(transcription) if char == " "]
        nearest_index = min(space_indexes, key=lambda x: abs(x - transcription_length / 2))

        # First transcript part
        first_transcript = transcription[:nearest_index]

        # Second transcript part
        second_transcript = transcription[nearest_index + 1:]

        # Add both transcript parts to the srt_text
        srt_text += str(srt_index) + "\n" + str(timedelta(milliseconds=sub_start)).split(".")[0] + " --&gt; " + middle_timestamp + "\n" + first_transcript + "\n\n"
        srt_index += 1
        srt_text += str(srt_index) + "\n" + middle_timestamp + " --&gt; " + str(timedelta(milliseconds=sub_end)).split(".")[0] + "\n" + second_transcript + "\n\n"
        srt_index += 1
    else:
        # Add transcript without operations
        srt_text += str(srt_index) + "\n" + str(timedelta(milliseconds=sub_start)).split(".")[0] + " --&gt; " + str(timedelta(milliseconds=sub_end)).split(".")[0] + "\n" + transcription + "\n\n"

    return srt_text, srt_index</code></pre>



<p id="block-d0254180-773e-45b3-b19e-c42153f6f102"><strong>Global function which performs the whole diarization</strong> <strong>action</strong></p>



<p><em>This function just calls all the previous diarization functions to perform it</em></p>



<pre id="block-a4e19f59-8947-4736-93d7-89fdcc2ce9f8" class="wp-block-code"><code class="">def diarization_treatment(filename, dia_pipeline, max_space, srt_token):
    """
    Launch the whole diarization process to get speakers time intervals as pandas timedelta objects
    :param filename: name of the audio file
    :param dia_pipeline: Diarization Model (Differentiate speakers)
    :param max_space: Maximum temporal distance between two silences
    :param srt_token: Enable/Disable generate srt file (choice fixed by user)
    :return: speakers time intervals list and number of different detected speakers
    """
    
    # initialization
    diarization_timestamps = []

    # whole diarization process
    diarization, number_of_speakers = get_diarization(dia_pipeline, filename)

    if len(diarization) &gt; 0:
        diarization_timestamps = convert_str_diarlist_to_timedelta(diarization)
        diarization_timestamps = merge_speaker_times(diarization_timestamps, max_space, srt_token)
        diarization_timestamps = extending_timestamps(diarization_timestamps)

    return diarization_timestamps, number_of_speakers</code></pre>



<p id="block-76f3ec1f-9b23-4948-a124-51534febb30c"><strong>Launch diarization mode</strong></p>



<p>Previously, we were systematically running the <em>transcription_non_diarization()</em> function, which is based on the <em>silence detection method</em>.</p>



<p>But now that the user has the option to select the diarization option in the form, it is time to <strong>write our transcription_diarization() function</strong>. </p>



<p>The only difference between the two is that we replace the silences treatment by the treatment of the results of diarization.</p>



<pre id="block-e9b9cf00-5ac0-4381-877c-430dc393bc79" class="wp-block-code"><code class="">def transcription_diarization(filename, diarization_timestamps, stt_model, stt_tokenizer, diarization_token, srt_token,
                              summarize_token, timestamps_token, myaudio, start, save_result, txt_text, srt_text):
    """
    Performs transcription with the diarization mode
    :param filename: name of the audio file
    :param diarization_timestamps: timestamps of each audio part (ex 10 to 50 secs)
    :param stt_model: Speech to text model
    :param stt_tokenizer: Speech to text model's tokenizer
    :param diarization_token: Differentiate or not the speakers (choice fixed by user)
    :param srt_token: Enable/Disable generate srt file (choice fixed by user)
    :param summarize_token: Summarize or not the transcript (choice fixed by user)
    :param timestamps_token: Display and save or not the timestamps (choice fixed by user)
    :param myaudio: AudioSegment file
    :param start: int value (s) given by st.slider() (fixed by user)
    :param save_result: whole process
    :param txt_text: generated .txt transcript
    :param srt_text: generated .srt transcript
    :return: results of transcribing action
    """
    # Numeric counter that identifies each sequential subtitle
    srt_index = 1

    # Handle a rare case : Only the case if only one "list" in the list (it makes a classic list) not a list of list
    if not isinstance(diarization_timestamps[0], list):
        diarization_timestamps = [diarization_timestamps]

    # Transcribe each audio chunk (from timestamp to timestamp) and display transcript
    for index, elt in enumerate(diarization_timestamps):
        sub_start = elt[0]
        sub_end = elt[1]

        transcription = transcribe_audio_part(filename, stt_model, stt_tokenizer, myaudio, sub_start, sub_end,
                                              index)

        # Initial audio has been split with start &amp; end values
        # It begins to 0s, but the timestamps need to be adjust with +start*1000 values to adapt the gap
        if transcription != "":
            save_result, txt_text, srt_text, srt_index = display_transcription(diarization_token, summarize_token,
                                                                    srt_token, timestamps_token,
                                                                    transcription, save_result, txt_text,
                                                                    srt_text,
                                                                    srt_index, sub_start + start * 1000,
                                                                    sub_end + start * 1000, elt)
    return save_result, txt_text, srt_text</code></pre>



<p>The <em>display_transcription()</em> function returns 3 values for the moment, contrary to what we have just indicated in the <em>transcription_diarization()</em> function. Don&#8217;t worry, we will fix the <em>display_transcription()</em> function in a few moments.</p>



<p>You will also need the function below. It will allow the user to validate his access token to the diarization model and access the home page of our app. Indeed, we are going to create another page by default, which will invite the user to enter his token, if he wishes.</p>



<pre class="wp-block-code"><code class="">def confirm_token_change(hf_token, page_index):
    """
    A function that saves the hugging face token entered by the user.
    It also updates the page index variable so we can indicate we now want to display the home page instead of the token page
    :param hf_token: user's token
    :param page_index: number that represents the home page index (mentioned in the main.py file)
    """
    update_session_state("my_HF_token", hf_token)
    update_session_state("page_index", page_index)</code></pre>



<h3 class="wp-block-heading" id="block-e5d0971f-6a22-467e-ba4a-bc5e19572c83">Display the transcript correctly</h3>



<p id="block-8bf020a4-d6a8-4681-afac-6609255df1b6">Once the transcript is obtained, we must <strong>display</strong> it correctly, <strong>depending on the options</strong> the user has selected.</p>



<p id="block-4938f147-7b32-453e-81d8-09a15fb6d99b">For example, if the user has activated diarization, we need to write the identified speaker before each transcript, like the following result:</p>



<p><em>Speaker1 : &#8220;I would like a cup of tea&#8221;</em></p>



<p id="block-4938f147-7b32-453e-81d8-09a15fb6d99b">This is different from a classic <em>silences detection</em> method, which only writes the transcript, without any names!</p>



<p id="block-c5971940-a2fd-4c7c-9e52-aee8049adce4">There is the same case with the timestamps. We must know if we need to display them or not. We then have <strong>4 different cases</strong>:</p>



<ul class="wp-block-list">
<li>diarization with timestamps, named <strong>DIA_TS</strong></li>



<li>diarization without timestamps, named <strong>DIA</strong></li>



<li>non_diarization with timestamps, named <strong>NODIA_TS</strong></li>



<li>non_diarization without timestamps, named <strong>NODIA</strong></li>
</ul>



<p id="block-0620b0c4-2413-4623-90fa-706a1c50c8d6">To display the correct elements according to the chosen mode, let&#8217;s <strong>modify the <em>display_transcription()</em> function</strong>. </p>



<p id="block-0620b0c4-2413-4623-90fa-706a1c50c8d6"><strong>Replace the old one</strong> by the following code:</p>



<pre id="block-e20b214e-af00-4c59-a5e5-6196a14445b1" class="wp-block-code"><code class="">def display_transcription(diarization_token, summarize_token, srt_token, timestamps_token, transcription, save_result, txt_text, srt_text, srt_index, sub_start, sub_end, elt=None):
    """
    Display results
    :param diarization_token: Differentiate or not the speakers (choice fixed by user)
    :param summarize_token: Summarize or not the transcript (choice fixed by user)
    :param srt_token: Enable/Disable generate srt file (choice fixed by user)
    :param timestamps_token: Display and save or not the timestamps (choice fixed by user)
    :param transcription: transcript of the considered audio
    :param save_result: whole process
    :param txt_text: generated .txt transcript
    :param srt_text: generated .srt transcript
    :param srt_index : numeric counter that identifies each sequential subtitle
    :param sub_start: start value (s) of the considered audio part to transcribe
    :param sub_end: end value (s) of the considered audio part to transcribe
    :param elt: timestamp (diarization case only, otherwise elt = None)
    """
    # Display will be different depending on the mode (dia, no dia, dia_ts, nodia_ts)
    
    # diarization mode
    if diarization_token:
        if summarize_token:
            update_session_state("summary", transcription + " ", concatenate_token=True)
        
        if not timestamps_token:
            temp_transcription = elt[2] + " : " + transcription
            st.write(temp_transcription + "\n\n")

            save_result.append([int(elt[2][-1]), elt[2], " : " + transcription])
            
        elif timestamps_token:
            temp_timestamps = str(timedelta(milliseconds=sub_start)).split(".")[0] + " --&gt; " + \
                              str(timedelta(milliseconds=sub_end)).split(".")[0] + "\n"
            temp_transcription = elt[2] + " : " + transcription
            temp_list = [temp_timestamps, int(elt[2][-1]), elt[2], " : " + transcription, int(sub_start / 1000)]
            save_result.append(temp_list)
            st.button(temp_timestamps, on_click=click_timestamp_btn, args=(sub_start,))
            st.write(temp_transcription + "\n\n")
            
            if srt_token:
                srt_text, srt_index = optimize_subtitles(transcription, srt_index, sub_start, sub_end, srt_text)


    # Non diarization case
    else:
        if not timestamps_token:
            save_result.append([transcription])
            st.write(transcription + "\n\n")
            
        else:
            temp_timestamps = str(timedelta(milliseconds=sub_start)).split(".")[0] + " --&gt; " + \
                              str(timedelta(milliseconds=sub_end)).split(".")[0] + "\n"
            temp_list = [temp_timestamps, transcription, int(sub_start / 1000)]
            save_result.append(temp_list)
            st.button(temp_timestamps, on_click=click_timestamp_btn, args=(sub_start,))
            st.write(transcription + "\n\n")
            
            if srt_token:
                srt_text, srt_index = optimize_subtitles(transcription, srt_index, sub_start, sub_end, srt_text)

        txt_text += transcription + " "  # So x seconds sentences are separated

    return save_result, txt_text, srt_text, srt_index</code></pre>



<p id="block-0851dc0b-f347-4738-ab2d-5b9c0227c253">We also need to <strong>add the following function</strong> which allow us to create our <em>txt_text</em> variable from the <em>st.session.state[&#8216;process&#8217;]</em> variable in a diarization case. This is necessary because, in addition to displaying the spoken sentence which means the transcript part, we must display the identity of the speaker, and eventually the timestamps, which are all stored in the session state variable. </p>



<pre id="block-618a8493-cb76-4fe4-9fce-24287e9eb588" class="wp-block-code"><code class="">def create_txt_text_from_process(punctuation_token=False, t5_model=None, t5_tokenizer=None):
    """
    If we are in a diarization case (differentiate speakers), we create txt_text from st.session.state['process']
    There is a lot of information in the process variable, but we only extract the identity of the speaker and
    the sentence spoken, as in a non-diarization case.
    :param punctuation_token: Punctuate or not the transcript (choice fixed by user)
    :param t5_model: T5 Model (Auto punctuation model)
    :param t5_tokenizer: T5’s Tokenizer (Auto punctuation model's tokenizer)
    :return: Final transcript (without timestamps)
    """
    txt_text = ""
    # The information to be extracted is different according to the chosen mode
    if punctuation_token:
        with st.spinner("Transcription is finished! Let us punctuate your audio"):
            if st.session_state["chosen_mode"] == "DIA":
                for elt in st.session_state["process"]:
                    # [2:] don't want ": text" but only the "text"
                    text_to_punctuate = elt[2][2:]
                    if len(text_to_punctuate) &gt;= 512:
                        text_to_punctutate_list = split_text(text_to_punctuate, 512)
                        punctuated_text = ""
                        for split_text_to_punctuate in text_to_punctutate_list:
                            punctuated_text += add_punctuation(t5_model, t5_tokenizer, split_text_to_punctuate)
                    else:
                        punctuated_text = add_punctuation(t5_model, t5_tokenizer, text_to_punctuate)

                    txt_text += elt[1] + " : " + punctuated_text + '\n\n'

            elif st.session_state["chosen_mode"] == "DIA_TS":
                for elt in st.session_state["process"]:
                    text_to_punctuate = elt[3][2:]
                    if len(text_to_punctuate) &gt;= 512:
                        text_to_punctutate_list = split_text(text_to_punctuate, 512)
                        punctuated_text = ""
                        for split_text_to_punctuate in text_to_punctutate_list:
                            punctuated_text += add_punctuation(t5_model, t5_tokenizer, split_text_to_punctuate)
                    else:
                        punctuated_text = add_punctuation(t5_model, t5_tokenizer, text_to_punctuate)

                    txt_text += elt[2] + " : " + punctuated_text + '\n\n'
    else:
        if st.session_state["chosen_mode"] == "DIA":
            for elt in st.session_state["process"]:
                txt_text += elt[1] + elt[2] + '\n\n'

        elif st.session_state["chosen_mode"] == "DIA_TS":
            for elt in st.session_state["process"]:
                txt_text += elt[2] + elt[3] + '\n\n'

    return txt_text</code></pre>



<p>Also for the purpose of correct display, we need to <strong>update the <em>display_results()</em> function</strong> so that it adapts the display to the selected mode among DIA_TS, DIA, NODIA_TS, NODIA. This will also avoid <em>&#8216;List index out of range&#8217; </em>errors, as the <em>process</em> variable does not contain the same number of elements depending on the mode used.</p>



<pre class="wp-block-code"><code class=""># Update the following function code
def display_results():

    # Add a button to return to the main page
    st.button("Load an other file", on_click=update_session_state, args=("page_index", 0,))

    # Display results
    st.audio(st.session_state['audio_file'], start_time=st.session_state["start_time"])

    # Display results of transcript by steps
    if st.session_state["process"] != []:

        if st.session_state["chosen_mode"] == "NODIA":  # Non diarization, non timestamps case
            for elt in (st.session_state['process']):
                st.write(elt[0])

        elif st.session_state["chosen_mode"] == "DIA":  # Diarization without timestamps case
            for elt in (st.session_state['process']):
                st.write(elt[1] + elt[2])

        elif st.session_state["chosen_mode"] == "NODIA_TS":  # Non diarization with timestamps case
            for elt in (st.session_state['process']):
                st.button(elt[0], on_click=update_session_state, args=("start_time", elt[2],))
                st.write(elt[1])

        elif st.session_state["chosen_mode"] == "DIA_TS":  # Diarization with timestamps case
            for elt in (st.session_state['process']):
                st.button(elt[0], on_click=update_session_state, args=("start_time", elt[4],))
                st.write(elt[2] + elt[3])

    # Display final text
    st.subheader("Final text is")
    st.write(st.session_state["txt_transcript"])

    # Display Summary
    if st.session_state["summary"] != "":
        with st.expander("Summary"):
            st.write(st.session_state["summary"])

    # Display the buttons in a list to avoid having empty columns (explained in the transcription() function)
    col1, col2, col3, col4 = st.columns(4)
    col_list = [col1, col2, col3, col4]
    col_index = 0

    for elt in st.session_state["btn_token_list"]:
        if elt[0]:
            mycol = col_list[col_index]
            if elt[1] == "useless_txt_token":
                # Download your transcription.txt
                with mycol:
                    st.download_button("Download as TXT", st.session_state["txt_transcript"],
                                       file_name="my_transcription.txt")

            elif elt[1] == "srt_token":
                # Download your transcription.srt
                with mycol:
                    st.download_button("Download as SRT", st.session_state["srt_txt"], file_name="my_transcription.srt")
            elif elt[1] == "dia_token":
                with mycol:
                    # Rename the speakers detected in your audio
                    st.button("Rename Speakers", on_click=update_session_state, args=("page_index", 2,))

            elif elt[1] == "summarize_token":
                with mycol:
                    st.download_button("Download Summary", st.session_state["summary"], file_name="my_summary.txt")
            col_index += 1</code></pre>



<p>We then display <strong>4 buttons</strong> that allow you to i<strong>nteract with the implemented functions</strong> (download the transcript in <em>.txt</em> format, in <em>.srt</em>, download the summary, and rename the speakers.</p>



<p>These buttons are placed in 4 columns which allows them to be displayed in one line. The problem is that these options are sometimes enabled and sometimes not. If we assign a button to a column, we risk having an empty column among the four columns, which would not be aesthetically pleasing.</p>



<p>This is where the <em>token_list</em> comes in! This is a list of list which contains in each of its indexes a list, having for first element the value of the token, and in second its denomination. For example, we can find in the <em>token_list</em> the following list: <em>[True, &#8220;dia_token&#8221;]</em>, which means that diarization option has been selected. </p>



<p>From this, we can assign a button to a column only if it contains an token set to <em>True</em>. If the token is set to <em>False</em>, we will retry to use this column for the next token. This avoids creating an empty column.</p>



<h3 class="wp-block-heading" id="block-a695aa41-4c72-4fb6-a38a-634ea54d5710"><strong>Rename Speakers</strong></h3>



<p id="block-d589a3b7-3887-48cb-a7fa-a95648d12309">Of course, it would be interesting to have the possibility to <strong>rename the detected speakers</strong> in the audio file. Indeed, having <em>Speaker0, Speaker1,</em> &#8230; is fine but it could be so much better with <strong>real names</strong>! Guess what? We are going to do this!</p>



<p id="block-9a0918b6-4b78-43e8-b25a-4b8b6f3bbe02">First, we will <strong>create a list</strong> where we will <strong>add each speaker with his &#8216;ID&#8217;</strong> (<em>ex: Speaker1 has 1 as his ID</em>).</p>



<p>Unfortunately, the diarization <strong>does not sort out the interlocutors</strong>. For example, the first one detected might be Speaker3, followed by Speaker0, then Speaker2. This is why it is important to sort this list, for example by placing the lowest ID as the first element of our list. This will allow not to exchange names between speakers.</p>



<p>Once this is done, we need to find a way for the user to interact with this list and modify the names contained in it. </p>



<p>➡️ We are going to create a third page that will be dedicated to this functionality. On this page, we will display each name contained in the list in a <em>st.text_area()</em> widget. The user will be able to see how many people have been detected in his audio and the automatic names (<em>Speaker0, Speaker1,</em> &#8230;) that have been assigned to them, as the screen below shows:</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/11/rename-speakers-page.png" alt="speech to text application speakers differentiation" class="wp-image-24106" width="661" height="406" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/11/rename-speakers-page.png 885w, https://blog.ovhcloud.com/wp-content/uploads/2022/11/rename-speakers-page-300x184.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/11/rename-speakers-page-768x471.png 768w" sizes="auto, (max-width: 661px) 100vw, 661px" /><figcaption class="wp-element-caption">Overview of the <em>Rename Speakers</em> page</figcaption></figure>



<p id="block-942f97d7-aa6c-4931-8e46-efda8d3c22d3">The user is able to modify this text area. Indeed, he can replace each name with the oneche wants but he must <strong>respect the one name per line format</strong>. When he has finished, he can <strong>save his modifications</strong> by <strong>clicking a &#8220;<em>Save changes</em>&#8221; button</strong>, which <strong>calls the callback function <em>click_confirm_rename_btn</em>()</strong> that we will define just after. We also display a <strong>&#8220;<em>Cancel</em>&#8221; button</strong> that will redirect the user to the results page.</p>



<p>All this process is realized by the <em>rename_speakers_window()</em> function. <strong>Add it to your code:</strong></p>



<pre id="block-d94c23c7-cb50-4a25-9472-f1f59f798e94" class="wp-block-code"><code class="">def rename_speakers_window():
    """
    Load a new page which allows the user to rename the different speakers from the diarization process
    For example he can switch from "Speaker1 : "I wouldn't say that"" to "Mat : "I wouldn't say that""
    """

    st.subheader("Here you can rename the speakers as you want")
    number_of_speakers = st.session_state["number_of_speakers"]

    if number_of_speakers &gt; 0:
        # Handle displayed text according to the number_of_speakers
        if number_of_speakers == 1:
            st.write(str(number_of_speakers) + " speaker has been detected in your audio")
        else:
            st.write(str(number_of_speakers) + " speakers have been detected in your audio")

        # Saving the Speaker Name and its ID in a list, example : [1, 'Speaker1']
        list_of_speakers = []
        for elt in st.session_state["process"]:
            if st.session_state["chosen_mode"] == "DIA_TS":
                if [elt[1], elt[2]] not in list_of_speakers:
                    list_of_speakers.append([elt[1], elt[2]])
            elif st.session_state["chosen_mode"] == "DIA":
                if [elt[0], elt[1]] not in list_of_speakers:
                    list_of_speakers.append([elt[0], elt[1]])

        # Sorting (by ID)
        list_of_speakers.sort()  # [[1, 'Speaker1'], [0, 'Speaker0']] =&gt; [[0, 'Speaker0'], [1, 'Speaker1']]

        # Display saved names so the user can modify them
        initial_names = ""
        for elt in list_of_speakers:
            initial_names += elt[1] + "\n"

        names_input = st.text_area("Just replace the names without changing the format (one per line)",
                                   value=initial_names)

        # Display Options (Cancel / Save)
        col1, col2 = st.columns(2)
        with col1:
            # Cancel changes by clicking a button - callback function to return to the results page
            st.button("Cancel", on_click=update_session_state, args=("page_index", 1,))
        with col2:
            # Confirm changes by clicking a button - callback function to apply changes and return to the results page
            st.button("Save changes", on_click=click_confirm_rename_btn, args=(names_input, number_of_speakers, ))

    # Don't have anyone to rename
    else:
        st.error("0 speakers have been detected. Seem there is an issue with diarization")
        with st.spinner("Redirecting to transcription page"):
            time.sleep(4)
            # return to the results page
            update_session_state("page_index", 1)</code></pre>



<p id="block-8897a8bc-7dd3-43cc-8f83-851e8a9aea8e">Now, <strong>write the callback function</strong> that is called when the <em>&#8220;Save changes&#8221;</em> button is clicked. It allows to <strong>save the new speaker&#8217;s names</strong> in the <em>process</em> session state variable and to <strong>recreate the displayed text with the new names</strong> <strong>given by the user</strong> thanks to the previously defined function <em>create_txt_text_from_process()</em>. Finally, it <strong>redirects the user to the results page</strong>.</p>



<pre id="block-b64adcbc-4c78-4451-aa88-d8fc7de54e53" class="wp-block-code"><code class="">def click_confirm_rename_btn(names_input, number_of_speakers):<em>
</em>    """
    If the users decides to rename speakers and confirms his choices, we apply the modifications to our transcript
    Then we return to the results page of the app
    :param names_input: string
    :param number_of_speakers: Number of detected speakers in the audio file
    """
<em>
    </em>try:
        names_input = names_input.split("\n")[:number_of_speakers]

        for elt in st.session_state["process"]:
            elt[2] = names_input[elt[1]]

        txt_text = create_txt_text_from_process()
        update_session_state("txt_transcript", txt_text)
        update_session_state("page_index", 1)

    except TypeError:  # list indices must be integers or slices, not str (happened to me one time when writing non sense names)
        st.error("Please respect the 1 name per line format")
        with st.spinner("We are relaunching the page"):
            time.sleep(3)
            update_session_state("page_index", 1)</code></pre>



<h3 class="wp-block-heading" id="block-b681491a-4806-4d35-a6ed-5b6bc70dc839">Create subtitles for videos (.SRT)</h3>



<p>Idea is very simple here, process is the same as before. We just have in this case to <strong>shorten the timestamps</strong> by adjusting the <em>min_space</em> and the <em>max_space</em> values, so we have a <strong>good video-subtitles synchronization</strong>.</p>



<p id="block-f5aaab26-bda2-468f-8ea0-15bbcc374fda">Indeed, remember that <strong>subtitles must correspond to small time windows</strong> to have <strong>small synchronized transcripts</strong>. Otherwise, there will be too much text. That&#8217;s why we <strong>set the <em>min_space</em> to 1s and the <em>max_space</em> to 8s</strong> instead of the classic min: 25s and max: 45s values.</p>



<pre id="block-6889eddb-3ada-4d53-9d6a-6c132539ff62" class="wp-block-code"><code class="">def silence_mode_init(srt_token):
    """
    Fix min_space and max_space values
    If the user wants a srt file, we need to have tiny timestamps
    :param srt_token: Enable/Disable generate srt file option (choice fixed by user)
    :return: min_space and max_space values
    """
<em>
    </em>if srt_token:
        # We need short intervals if we want a short text
        min_space = 1000  # 1 sec
        max_space = 8000  # 8 secs

    else:

        min_space = 25000  # 25 secs
        max_space = 45000  # 45secs
    return min_space, max_space</code></pre>



<h3 class="wp-block-heading">Update old code</h3>



<p id="block-8e4448c3-d843-4e10-a357-7784ef86c67c">As we have a lot <strong>new parameters </strong><em>(diarization_token, timestamps_token, summarize_token, &#8230;)</em> in our <em>display_transcription() </em>function, we need to <strong>update our <em>transcription_non_diarization()</em> function </strong>so it can interact with these new parameters and display the transcript correctly.</p>



<pre id="block-4ef2e66a-16f6-4bf8-9777-2cf050df76a6" class="wp-block-code"><code class="">def transcription_non_diarization(filename, myaudio, start, end, diarization_token, timestamps_token, srt_token,
                                  summarize_token, stt_model, stt_tokenizer, min_space, max_space, save_result,
                                  txt_text, srt_text):
    """
    Performs transcribing action with the non-diarization mode
    :param filename: name of the audio file
    :param myaudio: AudioSegment file
    :param start: int value (s) given by st.slider() (fixed by user)
    :param end: int value (s) given by st.slider() (fixed by user)
    :param diarization_token: Differentiate or not the speakers (choice fixed by user)
    :param timestamps_token: Display and save or not the timestamps (choice fixed by user)
    :param srt_token: Enable/Disable generate srt file (choice fixed by user)
    :param summarize_token: Summarize or not the transcript (choice fixed by user)
    :param stt_model: Speech to text model
    :param stt_tokenizer: Speech to text model's tokenizer
    :param min_space: Minimum temporal distance between two silences
    :param max_space: Maximum temporal distance between two silences
    :param save_result: whole process
    :param txt_text: generated .txt transcript
    :param srt_text: generated .srt transcript
    :return: results of transcribing action
    """

    # Numeric counter identifying each sequential subtitle
    srt_index = 1

    # get silences
    silence_list = detect_silences(myaudio)
    if silence_list != []:
        silence_list = get_middle_silence_time(silence_list)
        silence_list = silences_distribution(silence_list, min_space, max_space, start, end, srt_token)
    else:
        silence_list = generate_regular_split_till_end(silence_list, int(end), min_space, max_space)

    # Transcribe each audio chunk (from timestamp to timestamp) and display transcript
    for i in range(0, len(silence_list) - 1):
        sub_start = silence_list[i]
        sub_end = silence_list[i + 1]

        transcription = transcribe_audio_part(filename, stt_model, stt_tokenizer, myaudio, sub_start, sub_end, i)

        # Initial audio has been split with start &amp; end values
        # It begins to 0s, but the timestamps need to be adjust with +start*1000 values to adapt the gap
        if transcription != "":
            save_result, txt_text, srt_text, srt_index = display_transcription(diarization_token, summarize_token,
                                                                    srt_token, timestamps_token,
                                                                    transcription, save_result,
                                                                    txt_text,
                                                                    srt_text,
                                                                    srt_index, sub_start + start * 1000,
                                                                    sub_end + start * 1000)

    return save_result, txt_text, srt_text</code></pre>



<p id="block-7f6d7de0-2aca-44d7-9854-9f86022a8fbc"><strong>Also, you need to add these new parameters</strong> to the <em>transcript_from_url()</em> and <em>transcript_from_files()</em> functions.</p>



<pre id="block-0fd5d977-eb66-436f-87a0-a0f564ae3268" class="wp-block-code"><code class="">def transcript_from_url(stt_tokenizer, stt_model, t5_tokenizer, t5_model, summarizer, dia_pipeline):
    """
    Displays a text input area, where the user can enter a YouTube URL link. If the link seems correct, we try to
    extract the audio from the video, and then transcribe it.

    :param stt_tokenizer: Speech to text model's tokenizer
    :param stt_model: Speech to text model
    :param t5_tokenizer: Auto punctuation model's tokenizer
    :param t5_model: Auto punctuation model
    :param summarizer: Summarizer model
    :param dia_pipeline: Diarization Model (Differentiate speakers)
    """

    url = st.text_input("Enter the YouTube video URL then press Enter to confirm!")
    # If link seems correct, we try to transcribe
    if "youtu" in url:
        filename = extract_audio_from_yt_video(url)
        if filename is not None:
            transcription(stt_tokenizer, stt_model, t5_tokenizer, t5_model, summarizer, dia_pipeline, filename)
        else:
            st.error("We were unable to extract the audio. Please verify your link, retry or choose another video")</code></pre>



<pre id="block-98c50033-3932-4b8c-8baa-647defb92303" class="wp-block-code"><code class="">def transcript_from_file(stt_tokenizer, stt_model, t5_tokenizer, t5_model, summarizer, dia_pipeline):
    """
    Displays a file uploader area, where the user can import his own file (mp3, mp4 or wav). If the file format seems
    correct, we transcribe the audio.
    """

    # File uploader widget with a callback function, so the page reloads if the users uploads a new audio file
    uploaded_file = st.file_uploader("Upload your file! It can be a .mp3, .mp4 or .wav", type=["mp3", "mp4", "wav"],
                                     on_change=update_session_state, args=("page_index", 0,))

    if uploaded_file is not None:
        # get name and launch transcription function
        filename = uploaded_file.name
        transcription(stt_tokenizer, stt_model, t5_tokenizer, t5_model, summarizer, dia_pipeline, filename, uploaded_file)</code></pre>



<p id="block-63b4dd8e-151e-4eb6-9588-d71f7ad078f8">Everything is almost ready, you can finally <strong>update the <em>transcription() </em>function</strong> so it can <strong>call all the new methods we have defined</strong>:</p>



<pre id="block-44999f30-2a48-4245-8b77-a177e94ecc97" class="wp-block-code"><code class="">def transcription(stt_tokenizer, stt_model, t5_tokenizer, t5_model, summarizer, dia_pipeline, filename,
                  uploaded_file=None):
    """
    Mini-main function
    Display options, transcribe an audio file and save results.
    :param stt_tokenizer: Speech to text model's tokenizer
    :param stt_model: Speech to text model
    :param t5_tokenizer: Auto punctuation model's tokenizer
    :param t5_model: Auto punctuation model
    :param summarizer: Summarizer model
    :param dia_pipeline: Diarization Model (Differentiate speakers)
    :param filename: name of the audio file
    :param uploaded_file: file / name of the audio file which allows the code to reach the file
    """

    # If the audio comes from the Youtube extraction mode, the audio is downloaded so the uploaded_file is
    # the same as the filename. We need to change the uploaded_file which is currently set to None
    if uploaded_file is None:
        uploaded_file = filename

    # Get audio length of the file(s)
    myaudio = AudioSegment.from_file(uploaded_file)
    audio_length = myaudio.duration_seconds

    # Save Audio (so we can display it on another page ("DISPLAY RESULTS"), otherwise it is lost)
    update_session_state("audio_file", uploaded_file)

    # Display audio file
    st.audio(uploaded_file)

    # Is transcription possible
    if audio_length &gt; 0:

        # We display options and user shares his wishes
        transcript_btn, start, end, diarization_token, punctuation_token, timestamps_token, srt_token, summarize_token, choose_better_model = load_options(
            int(audio_length), dia_pipeline)

        # If end value hasn't been changed, we fix it to the max value so we don't cut some ms of the audio because
        # end value is returned by a st.slider which return end value as a int (ex: return 12 sec instead of end=12.9s)
        if end == int(audio_length):
            end = audio_length

        # Switching model for the better one
        if choose_better_model:
            with st.spinner("We are loading the better model. Please wait..."):

                try:
                    stt_tokenizer = pickle.load(open("models/STT_tokenizer2_wav2vec2-large-960h-lv60-self.sav", 'rb'))
                except FileNotFoundError:
                    stt_tokenizer = Wav2Vec2Tokenizer.from_pretrained("facebook/wav2vec2-large-960h-lv60-self")

                try:
                    stt_model = pickle.load(open("models/STT_model2_wav2vec2-large-960h-lv60-self.sav", 'rb'))
                except FileNotFoundError:
                    stt_model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-large-960h-lv60-self")

        # Validate options and launch the transcription process thanks to the form's button
        if transcript_btn:

            # Check if start &amp; end values are correct
            start, end = correct_values(start, end, audio_length)

            # If start a/o end value(s) has/have changed, we trim/cut the audio according to the new start/end values.
            if start != 0 or end != audio_length:
                myaudio = myaudio[start * 1000:end * 1000]  # Works in milliseconds (*1000)

            # Transcribe process is running
            with st.spinner("We are transcribing your audio. Please wait"):

                # Initialize variables
                txt_text, srt_text, save_result = init_transcription(start, int(end))
                min_space, max_space = silence_mode_init(srt_token)

                # Differentiate speakers mode
                if diarization_token:

                    # Save mode chosen by user, to display expected results
                    if not timestamps_token:
                        update_session_state("chosen_mode", "DIA")
                    elif timestamps_token:
                        update_session_state("chosen_mode", "DIA_TS")

                    # Convert mp3/mp4 to wav (Differentiate speakers mode only accepts wav files)
                    if filename.endswith((".mp3", ".mp4")):
                        myaudio, filename = convert_file_to_wav(myaudio, filename)
                    else:
                        filename = "../data/" + filename
                        myaudio.export(filename, format="wav")

                    # Differentiate speakers process
                    diarization_timestamps, number_of_speakers = diarization_treatment(filename, dia_pipeline,
                                                                                       max_space, srt_token)
                    # Saving the number of detected speakers
                    update_session_state("number_of_speakers", number_of_speakers)

                    # Transcribe process with Diarization Mode
                    save_result, txt_text, srt_text = transcription_diarization(filename, diarization_timestamps,
                                                                                stt_model,
                                                                                stt_tokenizer,
                                                                                diarization_token,
                                                                                srt_token, summarize_token,
                                                                                timestamps_token, myaudio, start,
                                                                                save_result,
                                                                                txt_text, srt_text)

                # Non Diarization Mode
                else:
                    # Save mode chosen by user, to display expected results
                    if not timestamps_token:
                        update_session_state("chosen_mode", "NODIA")
                    if timestamps_token:
                        update_session_state("chosen_mode", "NODIA_TS")

                    filename = "../data/" + filename
                    # Transcribe process with non Diarization Mode
                    save_result, txt_text, srt_text = transcription_non_diarization(filename, myaudio, start, end,
                                                                                    diarization_token, timestamps_token,
                                                                                    srt_token, summarize_token,
                                                                                    stt_model, stt_tokenizer,
                                                                                    min_space, max_space,
                                                                                    save_result, txt_text, srt_text)

                # Save results so it is not lost when we interact with a button
                update_session_state("process", save_result)
                update_session_state("srt_txt", srt_text)

                # Get final text (with or without punctuation token)
                # Diariation Mode
                if diarization_token:
                    # Create txt text from the process
                    txt_text = create_txt_text_from_process(punctuation_token, t5_model, t5_tokenizer)

                # Non diarization Mode
                else:

                    if punctuation_token:
                        # Need to split the text by 512 text blocks size since the model has a limited input
                        with st.spinner("Transcription is finished! Let us punctuate your audio"):
                            my_split_text_list = split_text(txt_text, 512)
                            txt_text = ""
                            # punctuate each text block
                            for my_split_text in my_split_text_list:
                                txt_text += add_punctuation(t5_model, t5_tokenizer, my_split_text)

                # Clean folder's files
                clean_directory("../data")

                # Display the final transcript
                if txt_text != "":
                    st.subheader("Final text is")

                    # Save txt_text and display it
                    update_session_state("txt_transcript", txt_text)
                    st.markdown(txt_text, unsafe_allow_html=True)

                    # Summarize the transcript
                    if summarize_token:
                        with st.spinner("We are summarizing your audio"):
                            # Display summary in a st.expander widget to don't write too much text on the page
                            with st.expander("Summary"):
                                # Need to split the text by 1024 text blocks size since the model has a limited input
                                if diarization_token:
                                    # in diarization mode, the text to summarize is contained in the "summary" the session state variable
                                    my_split_text_list = split_text(st.session_state["summary"], 1024)
                                else:
                                    # in non-diarization mode, it is contained in the txt_text variable
                                    my_split_text_list = split_text(txt_text, 1024)

                                summary = ""
                                # Summarize each text block
                                for my_split_text in my_split_text_list:
                                    summary += summarizer(my_split_text)[0]['summary_text']

                                # Removing multiple spaces and double spaces around punctuation mark " . "
                                summary = re.sub(' +', ' ', summary)
                                summary = re.sub(r'\s+([?.!"])', r'\1', summary)

                                # Display summary and save it
                                st.write(summary)
                                update_session_state("summary", summary)

                    # Display buttons to interact with results

                    # We have 4 possible buttons depending on the user's choices. But we can't set 4 columns for 4
                    # buttons. Indeed, if the user displays only 3 buttons, it is possible that one of the column
                    # 1, 2 or 3 is empty which would be ugly. We want the activated options to be in the first columns
                    # so that the empty columns are not noticed. To do that, let's create a btn_token_list

                    btn_token_list = [[diarization_token, "dia_token"], [True, "useless_txt_token"],
                                      [srt_token, "srt_token"], [summarize_token, "summarize_token"]]

                    # Save this list to be able to reach it on the other pages of the app
                    update_session_state("btn_token_list", btn_token_list)

                    # Create 4 columns
                    col1, col2, col3, col4 = st.columns(4)

                    # Create a column list
                    col_list = [col1, col2, col3, col4]

                    # Check value of each token, if True, we put the respective button of the token in a column
                    col_index = 0
                    for elt in btn_token_list:
                        if elt[0]:
                            mycol = col_list[col_index]
                            if elt[1] == "useless_txt_token":
                                # Download your transcript.txt
                                with mycol:
                                    st.download_button("Download as TXT", txt_text, file_name="my_transcription.txt",
                                                       on_click=update_session_state, args=("page_index", 1,))
                            elif elt[1] == "srt_token":
                                # Download your transcript.srt
                                with mycol:
                                    update_session_state("srt_token", srt_token)
                                    st.download_button("Download as SRT", srt_text, file_name="my_transcription.srt",
                                                       on_click=update_session_state, args=("page_index", 1,))
                            elif elt[1] == "dia_token":
                                with mycol:
                                    # Rename the speakers detected in your audio
                                    st.button("Rename Speakers", on_click=update_session_state, args=("page_index", 2,))

                            elif elt[1] == "summarize_token":
                                with mycol:
                                    # Download the summary of your transcript.txt
                                    st.download_button("Download Summary", st.session_state["summary"],
                                                       file_name="my_summary.txt",
                                                       on_click=update_session_state, args=("page_index", 1,))
                            col_index += 1

                else:
                    st.write("Transcription impossible, a problem occurred with your audio or your parameters, "
                             "we apologize :(")

    else:
        st.error("Seems your audio is 0 s long, please change your file")
        time.sleep(3)
        st.stop()</code></pre>



<p id="block-c336e842-7a4e-48a6-ad47-c819523e6ef0">Finally, <strong>update the main code</strong> of the python file, which allows to <strong>navigate between the different pages of our application</strong> (<em>token,</em> <em>home</em>, <em>results</em> and <em>rename pages</em>):</p>



<pre id="block-d623da5a-36b6-416a-a18c-79d7f2eb82fc" class="wp-block-code"><code class="">from app import *

if __name__ == '__main__':
    config()

    if st.session_state['page_index'] == -1:
        # Specify token page (mandatory to use the diarization option)
        st.warning('You must specify a token to use the diarization model. Otherwise, the app will be launched without this model. You can learn how to create your token here: https://huggingface.co/pyannote/speaker-diarization')
        text_input = st.text_input("Enter your Hugging Face token:", placeholder="ACCESS_TOKEN_GOES_HERE", type="password")

        # Confirm or continue without the option
        col1, col2 = st.columns(2)

        # save changes button
        with col1:
            confirm_btn = st.button("I have changed my token", on_click=confirm_token_change, args=(text_input, 0), disabled=st.session_state["disable"])
            # if text is changed, button is clickable
            if text_input != "ACCESS_TOKEN_GOES_HERE":
                st.session_state["disable"] = False

        # Continue without a token (there will be no diarization option)
        with col2:
            dont_mind_btn = st.button("Continue without this option", on_click=update_session_state, args=("page_index", 0))

    if st.session_state['page_index'] == 0:
        # Home page
        choice = st.radio("Features", ["By a video URL", "By uploading a file"])

        stt_tokenizer, stt_model, t5_tokenizer, t5_model, summarizer, dia_pipeline = load_models()

        if choice == "By a video URL":
            transcript_from_url(stt_tokenizer, stt_model, t5_tokenizer, t5_model, summarizer, dia_pipeline)

        elif choice == "By uploading a file":
            transcript_from_file(stt_tokenizer, stt_model, t5_tokenizer, t5_model, summarizer, dia_pipeline)

    elif st.session_state['page_index'] == 1:
        # Results page
        display_results()

    elif st.session_state['page_index'] == 2:
        # Rename speakers page
        rename_speakers_window()</code></pre>



<p>The idea is the following: </p>



<p>The user arrives at <em>the token page</em> (whose index is -1). He is invited to enter his diarization access token into a <em>text_input()</em> widget. These instructions are given to him by a <em>st.warning()</em>. He can then choose to enter his token and click the <em>confirm_btn</em>, which will then be clickable. But he can also choose not to use this option by clicking on the <em>dont_mind</em> button. In both cases, the variabel page_index will be updated to 0, and the application will then display the <em>home page</em> that will allow the user to transcribe his files.</p>



<p>In this logic, you will understand that the session variable <em>page_index</em> must <strong>no longer be initialized to 0 </strong>(index of the <em>home page</em>), but <strong>to -1</strong>, in order to load the <em>token page</em>. For that, <strong>modify its initialization in the <em>config()</em> function</strong>:</p>



<pre class="wp-block-code"><code class=""># Modify the page_index initialization in the config() function

def config(): 

    # .... 

    if 'page_index' not in st.session_state:
        st.session_state['page_index'] = -1 </code></pre>



<h3 class="wp-block-heading">Conclusion</h3>



<p>Congratulations! Your Speech to Text Application is now full of features. Now it&#8217;s time to have fun with! </p>



<p>You can transcribe audio files, videos, with or without punctuation. You can also generate synchronized subtitles. You have also discovered how to differentiate speakers thanks to diarization, in order to follow a conversation more easily.</p>



<p>➡️ To <strong>significantly reduce the initialization time</strong> of the app and the <strong>execution time of the transcribing</strong>, we recommend that you deploy your speech to text app on powerful GPU ressources with <strong>AI Deploy</strong>. To learn how to do it, please refer to&nbsp;<a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/tuto-streamlit-speech-to-text-app/" target="_blank" rel="noreferrer noopener" data-wpel-link="exclude">this documentation</a>.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-to-build-a-speech-to-text-application-with-python-3-3%2F&amp;action_name=How%20to%20build%20a%20Speech-To-Text%20Application%20with%20Python%20%283%2F3%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to build a Speech-To-Text Application with Python (2/3)</title>
		<link>https://blog.ovhcloud.com/how-to-build-a-speech-to-text-application-with-python-2-3/</link>
		
		<dc:creator><![CDATA[Mathieu Busquet]]></dc:creator>
		<pubDate>Wed, 14 Dec 2022 09:26:39 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Apps]]></category>
		<category><![CDATA[AI Solutions]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[PyTorch]]></category>
		<category><![CDATA[Streamlit]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=23283</guid>

					<description><![CDATA[A tutorial to create and build your own Speech-To-Text Application with Python. At the end of this second article, your Speech-To-Text application will be more interactive and visually better. Indeed, we are going to center our titles and justify our transcript. We will also add some useful buttons (to download the transcript, to play with [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-to-build-a-speech-to-text-application-with-python-2-3%2F&amp;action_name=How%20to%20build%20a%20Speech-To-Text%20Application%20with%20Python%20%282%2F3%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>A tutorial to create and build your own <strong>Speech-To-Text Application</strong></em> with Python.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="576" src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-2-1024x576.png" alt="speech to text app image2" class="wp-image-24059" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-2-1024x576.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-2-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-2-768x432.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-2-1536x864.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-2.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>At the end of this second article, your Speech-To-Text application will be <strong>more interactive</strong> and<strong> visually better</strong>. </p>



<p>Indeed, we are going to <strong>center</strong> our titles and <strong>justify</strong> our transcript. We will also add some useful <strong>buttons</strong> (to download the transcript, to play with the timestamps). Finally, we will prepare the application for the next tutorial by displaying <strong>sliders and checkboxes</strong> to interact with the next functionalities (speaker differentiation, summarization, video subtitles generation, &#8230;)</p>



<p><em>Final code of the app is available in our dedicated <a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/speech-to-text" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a>.</em></p>



<h3 class="wp-block-heading">Overview of our final app</h3>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="575" src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview-1024x575.png" alt="speech to text streamlit app" class="wp-image-23277" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview-1024x575.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview-768x432.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview-1536x863.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p class="has-text-align-center"><em>Overview of our final Speech-To-Text application</em></p>



<h3 class="wp-block-heading">Objective</h3>



<p>In the <a href="https://blog.ovhcloud.com/how-to-build-a-speech-to-text-application-with-python-1-3/" data-wpel-link="internal">previous article</a>, we have seen how to build a <strong>basic</strong> <strong>Speech-To-Text application</strong> with <em>Python</em> and <em>Streamlit</em>. In this tutorial, we will <strong>improve </strong>this application by <strong>changing its appearance</strong>, <strong>improving its interactivity</strong> and <strong>preparing features</strong> used in the <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/natural-language-processing/speech-to-text/miniconda" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">notebooks</a> (transcript a specific audio part, differentiate speakers, generate video subtitles, punctuate and summarize the transcript, &#8230;) that we will implement in the last tutorial! </p>



<p>This article is organized as follows:</p>



<ul class="wp-block-list">
<li>Python libraries</li>



<li>Change appearance with CSS</li>



<li>Improve the app&#8217;s interactivity</li>



<li>Prepare new functionalities</li>
</ul>



<p><em>⚠️ Since this article uses code already explained in the previous <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/natural-language-processing/speech-to-text/miniconda" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">notebook tutorials</a>, w</em>e will not <em>re-explained its <em>usefulness</em></em> here<em>. We therefore recommend that you read the notebooks first.</em></p>



<h4 class="wp-block-heading">1. Python libraries</h4>



<p>To implement our final features (speakers differentiation, summarization, &#8230;) to our speech to text app, we need to <strong>import</strong> the following libraries into our <em>app.py</em> file. We will use them afterwards.</p>



<pre class="wp-block-code"><code class=""># Models
from pyannote.audio import Pipeline
from transformers import pipeline, HubertForCTC, T5Tokenizer, T5ForConditionalGeneration, Wav2Vec2ForCTC, Wav2Vec2Processor, Wav2Vec2Tokenizer
import pickle

# Others
import pandas as pd
import re</code></pre>



<h4 class="wp-block-heading">2. Change appearance with CSS</h4>



<p>Before adding or modifying anything, let&#8217;s <strong>improve the appearance</strong> of our application!</p>



<p>😕 Indeed, you maybe noticed our <strong>transcript is not justified</strong>, <strong>titles are not centered</strong> and there is an <strong>unnecessary space</strong> at the top of the screen.</p>



<p>➡️ To solve this, let&#8217;s use the <em>st.markdown()</em> function to write some <strong>CSS code</strong> thanks to the &#8220;<em>style</em>&#8221; attribute! </p>



<p>Just <strong>add the following lines to the <em>config()</em> function</strong> we have created before, for example after the <em>st.title(&#8220;Speech to Text App 📝&#8221;)</em> line. This will tell <em>Streamlit</em> how it should display the mentioned elements.</p>



<pre class="wp-block-code"><code class="">    st.markdown("""
                    &lt;style&gt;
                    .block-container.css-12oz5g7.egzxvld2{
                        padding: 1%;}
                   
                    .stRadio &gt; label:nth-child(1){
                        font-weight: bold;
                        }
                    .stRadio &gt; div{flex-direction:row;}
                    p, span{ 
                        text-align: justify;
                    }
                    span{ 
                        text-align: center;
                    }
                    """, unsafe_allow_html=True)</code></pre>



<p>We set the parameter &#8220;<em>unsafe_allow_html</em>&#8221; to &#8220;<em>True</em>&#8221; because HTML tags are escaped by default and therefore treated as pure text. Setting this argument to True turns off this behavior.</p>



<p class="has-text-align-left">⬇️ Let&#8217;s look at the result:</p>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:100%">
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:100%">
<figure class="wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex">
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1554" height="873" data-id="23292" src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/without_css-edited.png" alt="speech to text streamlit application without css" class="wp-image-23292" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/07/without_css-edited.png 1554w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/without_css-edited-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/without_css-edited-1024x575.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/without_css-edited-768x431.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/without_css-edited-1536x863.png 1536w" sizes="auto, (max-width: 1554px) 100vw, 1554px" /><figcaption class="wp-element-caption"><br></figcaption></figure>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1630" height="915" data-id="23291" src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/with_css-edited.png" alt="speech to text streamlit application with css" class="wp-image-23291" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/07/with_css-edited.png 1630w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/with_css-edited-300x168.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/with_css-edited-1024x575.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/with_css-edited-768x431.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/with_css-edited-1536x862.png 1536w" sizes="auto, (max-width: 1630px) 100vw, 1630px" /></figure>
</figure>
</div>
</div>
</div>
</div>



<p class="has-text-align-center"><em>App without CSS (on the left) and with (on the right)</em></p>



<p>This is better now, isn&#8217;t it? </p>



<h4 class="wp-block-heading">3. Improve the app&#8217;s interactivity</h4>



<p>Now we will let the user <strong>interact</strong> with our application. It will no longer only generate a transcript.</p>



<p><strong>3.1 Download transcript</strong></p>



<p>We want the users to be able to <strong>download the generated transcript as a text file</strong>. This will save them from having to copy the transcript and paste it into a text file. We can do this easily with a <strong>download button widget</strong><em>.</em></p>



<p>Unfortunately, <em>Streamlit</em> does not help us with this feature. Indeed, <strong>each time you interact with a button</strong> on the page, the entire <em>Streamlit</em> <strong>script will be re-run</strong> so it will <strong>delete our displayed transcript</strong>. To observe this problem, <strong>add a download button </strong>to the app<strong>,</strong> just after the <em>st.write(txt_text)</em> in the <em>transcription()</em> function, thanks to the following code:</p>



<pre class="wp-block-code"><code class=""># Download transcript button - Add it to the transcription() function, after the st.write(txt_text) line
st.download_button("Download as TXT", txt_text, file_name="my_transcription.txt")</code></pre>



<p>Now, if you transcribe an audio file, you should have a download button at the bottom of the transcript and if you click it, you will get the transcript in a .txt format as expected. But you will notice that the<strong> whole transcript disappears</strong> <strong>without any reason</strong> which is frustrating for the user, as the video below shows: </p>



<figure class="wp-block-video aligncenter"><video height="720" style="aspect-ratio: 1280 / 720;" width="1280" controls src="https://blog.ovhcloud.com/wp-content/uploads/2022/11/speech_to_text_app_click_button_issue.mp4"></video></figure>



<p class="has-text-align-center"><em>Video illustrating the issue with Streamlit button widgets</em></p>



<p>To solve this, we are going to use <strong><em>Streamlit</em> <em>session state</em></strong> and <strong><em>callback functions</em></strong>. Indeed, session state is a way to share variables between reruns. Since <em>Streamlit</em> reruns the app&#8217;s script when we click a button, this is the perfect solution!</p>



<p>➡️ First, let&#8217;s <strong>initialize four session state variables </strong>respectively called <em>audio_file</em>, <em>process, txt_transcript</em> and <em>page_index</em>. </p>



<p><em>As the session states variables are initialized only one time in the code, we can <strong>initialize them all at once </strong>as belo</em>w, <strong>in the <em>config()</em> function</strong>:</p>



<pre class="wp-block-code"><code class=""># Initialize session state variables
# Should be added to the config() function 
if 'page_index' not in st.session_state:
    st.session_state['audio_file'] = None
    st.session_state["process"] = []
    st.session_state['txt_transcript'] = ""
    st.session_state["page_index"] = 0</code></pre>



<p>The first one allow us to <strong>save the audio file </strong>of the user. Then<em>, </em>the<em> process</em> variable (which is a list) will <strong>contain each generated transcript part with its associated timestamps</strong>, while the third variable will only contain the concatenated transcripts, which means the <strong>final text</strong>.</p>



<p>The last variable, <em>page_index</em>, will <strong>determine which page of our application will be displayed</strong> according to its value. Indeed, since clicking the download button removes the displayed transcript, we are going to <strong>create a second page</strong>, named <strong>results page</strong>, where we will <strong>display again</strong> the user&#8217;s audio file and the obtained transcript <strong>thanks to the values saved in our session state variables</strong>. We can then <strong>redirect the user</strong> to this second page as soon as the user <strong>clicks the download button</strong>. This will allow the user to always be able to see his transcript, even if he downloads it!</p>



<p>➡️ Once we have initialized the session state variables, we need to <strong>save</strong> the transcript with the associated timestamps and the final text in <strong>these variables</strong> so we do not lose these information when we click a button.</p>



<p>To do that, we need to <strong>define an <em>update_session_state()</em> function</strong> which will allow us to <strong>update our session state variables</strong>, either by <strong>replacing</strong> their content, or by <strong>concatenating</strong> it, which will be interesting for the transcripts since they are obtained step by step. Indeed, <strong>concatenating each transcript part will allow us to obtain the final transcript</strong>. Here is the function:</p>



<pre class="wp-block-code"><code class="">def update_session_state(var, data, concatenate_token=False):
    """
    A simple function to update a session state variable
    :param var: variable's name
    :param data: new value of the variable
    :param concatenate_token: do we replace or concatenate
    """

    if concatenate_token:
        st.session_state[var] += data
    else:
        st.session_state[var] = data</code></pre>



<p>This is where we will use the variable<em> save_result</em> from the previous article. Actually, <em>save_result </em>is a list which contains the timestamps and the generated transcript. This corresponds to what we want in the <em>process</em> state variable, which will allow us to retrieve the transcripts and associated timestamps and display them on our results page! </p>



<pre class="wp-block-code"><code class="">### Add this line to the transcription() function, after the transcription_non_diarization() call

# Save results
update_session_state("process", save_result)</code></pre>



<p>Let&#8217;s do the same with the <em>audio_file</em> and <em>txt_text</em> variables, so we can also re-display the audio player and the final text on our results page.</p>



<pre class="wp-block-code"><code class="">### Add this line to the transcription() function, after the st.audio(uploaded_file) line, to save the audio file

# Save Audio so it is not lost when we interact with a button (so we can display it on the results page)
update_session_state("audio_file", uploaded_file)</code></pre>



<pre class="wp-block-code"><code class="">### Add this line to the transcription() function, after the if txt_text != ""

# Save txt_text
update_session_state("txt_transcript", txt_text)</code></pre>



<p>Thanks to the content saved in our session state variables (<em>audio_file</em>, <em>process, txt_transcript</em>), we are ready to create our results page.</p>



<p><strong>3.2 Create the results page</strong> <strong>and switch to it</strong></p>



<p>First, we have to tell <em>Streamlit</em> that <strong>clicking the download button</strong> must <strong>change the <em>page_index</em></strong> <strong>value</strong>. Indeed, remember that its value determines which page of our app is displayed. </p>



<p>If this variable is 0, we will see the home page. If we click a download button, the app&#8217;s script is restarted and the transcript will disappear from the home page. But since the <em>page_index</em> value will now be set to 1 when a button is clicked, we will display the results page instead of the home page and we will no longer have an empty page. </p>



<p>To do this, we simply <strong>add the previous function</strong> to our download button thanks to the <em>on_click</em> parameter, so we can indicate to our app that we want to update the <em>page_index</em> session state variable from 0 to 1 (to go from the home page to the results page) when we click this button.</p>



<pre class="wp-block-code"><code class="">### Modify the code of the download button, in the transcription() function, at the end of the the if txt_text != "" statement

st.download_button("Download as TXT", txt_text, file_name="my_transcription.txt", on_click=update_session_state, args=("page_index", 1,))</code></pre>



<p>Now that the <em>page_index</em> value is updated, we need to check its value to know if the displayed page should be the home page or the results page. </p>



<p>We do this value checking into the main code of our app. You can <strong>replace the old main code by the following one</strong>:</p>



<pre class="wp-block-code"><code class="">if __name__ == '__main__':
    config()

    # Default page
    if st.session_state['page_index'] == 0:
        choice = st.radio("Features", ["By a video URL", "By uploading a file"])

        stt_tokenizer, stt_model = load_models()

        if choice == "By a video URL":
            transcript_from_url(stt_tokenizer, stt_model)

        elif choice == "By uploading a file":
            transcript_from_file(stt_tokenizer, stt_model)

    # Results page
    elif st.session_state['page_index'] == 1:
        # Display Results page
        display_results()</code></pre>



<p>Now that we have created this page, all that remains is to display elements on it (titles, buttons, audio file, transcript)! </p>



<pre class="wp-block-code"><code class="">def display_results():

    st.button("Load an other file", on_click=update_session_state, args=("page_index", 0,))
    st.audio(st.session_state['audio_file'])

    # Display results of transcription by steps
    if st.session_state["process"] != []:
        for elt in (st.session_state['process']):

            # Timestamp
            st.write(elt[0])

            # Transcript for this timestamp
            st.write(elt[1])

    # Display final text
    st.subheader("Final text is")
    st.write(st.session_state["txt_transcript"])

    
    # Download your transcription.txt
    st.download_button("Download as TXT", st.session_state["txt_transcript"], file_name="my_transcription.txt")</code></pre>



<p>👀 You maybe noticed that at the beginning of the previous function, we have added a <strong>&#8220;<em>Load an other file</em>&#8221; button</strong>. If you look at it, you will see it has a <strong>callback function that updates the <em>page_index</em> to 0</strong>. In other words, this button <strong>allows the user to return to the home page </strong>so he can transcribe an other file.</p>



<p>Now let&#8217;s see what happens when we interact with this download button:</p>



<figure class="wp-block-video"><video height="1080" style="aspect-ratio: 1920 / 1080;" width="1920" controls src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/click_button_streamlit_solve-3.mov"></video></figure>



<p class="has-text-align-center"><em>Video illustrating the solved issue with Streamlit button widgets</em></p>



<p>As you can see, <strong>clicking the download button no longer makes the transcript disappear</strong>, thanks to our results page! We still have the <em>st.audio()</em> widget, the <em>process</em> as well as the final text and the download button. We have solved our problem!</p>



<p><strong>3.3 Jump audio start to each timestamps</strong></p>



<p>Our Speech-To-Text application would be so much better if the timestamps were displayed as buttons so that the user can <strong>click them and listen to the considered audio part</strong> thanks to the audio player widget we placed. Now that we know how to manipulate session state variables and callback functions, there is not much left to do 😁!</p>



<p>First, <strong>define a new session state variable</strong> in the config() function named <em><strong>start_time</strong></em>. It will indicate to our app where the <strong>starting point</strong> of the <em>st.audio() </em>widget should be. For the moment, it is always at 0s.</p>



<pre class="wp-block-code"><code class="">### Add this initialization to the config() function, with the other session state variables

st.session_state["start_time"] = 0</code></pre>



<p>Then, we define a new <strong>callback function</strong> that handles a <strong>timestamp button click</strong>. Just like before, it needs to <strong>redirect us to the results page</strong>, as we do not want the transcript to disappear. But it also needs to <strong>update the <em>start_time</em> variable</strong> to the beginning value of the timestamp button clicked by the user, so the starting point of the audio player can change. </p>



<p>For example, if the timestamp is [10s &#8211; 20s], we will set the starting point of the audio player to 10 seconds so that the user can check on the audio player the generated transcript for this part.</p>



<p>Here is the new callback function:</p>



<pre class="wp-block-code"><code class="">def click_timestamp_btn(sub_start):
    """
    When user clicks a Timestamp button, we go to the display results page and st.audio is set to the sub_start value)
    It allows the user to listen to the considered part of the audio
    :param sub_start: Beginning of the considered transcript (ms)
    """

    update_session_state("page_index", 1)
    update_session_state("start_time", int(sub_start / 1000)) # division to convert ms to s</code></pre>



<p>Now, we need to <strong>replace</strong> the timestamp text by a timestamp button, so we can click it.</p>



<p>To do this, just replace the <em>st.write(temp_timestamps)</em> of the <strong><em>display_transcription()</em> </strong>function<strong> by a widget button</strong> that calls our new callback function, with <em>sub_start</em> as an argument, which corresponds to the beginning value of the timestamp. In the previous example, <em>sub_start</em> would be 10s.</p>



<pre class="wp-block-code"><code class="">### Modify the code that displays the temp_timestamps variable, in the display_transcription() function
st.button(temp_timestamps, on_click=click_timestamp_btn, args=(sub_start,))</code></pre>



<p>To make it work, we also need to <strong>modify 3 lines of code in the <em>display_results()</em></strong> function, which manages the results page, because it needs to:</p>



<ul class="wp-block-list">
<li>Make the <em>st.audio()</em> widget starts from the <em>start_time</em> session state value</li>



<li>Display timestamps of the results page as buttons instead of texts</li>



<li>Call the <em>update_session_state()</em> function when we click one of these buttons to update the <em>start_time</em> value, so it changes the starting point of the audio player </li>
</ul>



<pre class="wp-block-code"><code class="">def display_results():
    st.button("Load an other file", on_click=update_session_state, args=("page_index", 0,))
    st.audio(st.session_state['audio_file'], start_time=st.session_state["start_time"])

    # Display results of transcription by steps
    if st.session_state["process"] != []:
        for elt in (st.session_state['process']):
            # Timestamp
            st.button(elt[0], on_click=update_session_state, args=("start_time", elt[2],))
            
            #Transcript for this timestamp
            st.write(elt[1])

    # Display final text
    st.subheader("Final text is")
    st.write(st.session_state["txt_transcript"])

    # Download your transcription.txt
    st.download_button("Download as TXT", st.session_state["txt_transcript"], file_name="my_transcription.txt")</code></pre>



<p>When you&#8217;ve done this, each timestamp button (home page and results page) will be able to change the starting point of the audio player, as you can see on this video:</p>



<figure class="wp-block-video"><video height="1080" style="aspect-ratio: 1920 / 1080;" width="1920" controls src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/click_timestamp_btn.mov"></video></figure>



<p class="has-text-align-center"><em>Video illustrating the timestamp button click</em></p>



<p>This feature is really useful to easily check each of the obtained transcripts! </p>



<h4 class="wp-block-heading">4. Preparing new functionalities</h4>



<p>Now that the application is taking shape, it is time to add the many features we studied in the <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/natural-language-processing/speech-to-text/miniconda" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">notebooks</a>.</p>



<p>Among these options are the possibility to:</p>



<ul class="wp-block-list">
<li>Trim/Cut an audio, if the user wants to transcribe only a specific part of the audio file</li>



<li>Differentiate speakers (Diarization)</li>



<li>Punctuate the transcript</li>



<li>Summarize the transcript</li>



<li>Generate subtitles for videos</li>



<li>Change the speech-to-text model to a better one (result will be longer)</li>



<li>Display or not the timestamps</li>
</ul>



<p><strong>4.1 Let the user enable these functionalities or not</strong></p>



<p>First of all, we need to provide the user with a way to customize his transcript by <strong>choosing the options he wants to activate</strong>.</p>



<p>➡️ To do this, we will use sliders &amp; check boxes as it shown on the screenshot below:</p>



<figure class="wp-block-image aligncenter size-full is-resized is-style-default"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-7.png" alt="sliders checkboxes streamlit app" class="wp-image-23746" width="543" height="264" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-7.png 718w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-7-300x146.png 300w" sizes="auto, (max-width: 543px) 100vw, 543px" /><figcaption class="wp-element-caption">Overview of the displayed options</figcaption></figure>



<p>Add the following function to your code. It will display all the options on our application.</p>



<pre class="wp-block-code"><code class="">def load_options(audio_length, dia_pipeline):
    """
    Display options so the user can customize the result (punctuate, summarize the transcript ? trim the audio? ...)
    User can choose his parameters thanks to sliders &amp; checkboxes, both displayed in a st.form so the page doesn't
    reload when interacting with an element (frustrating if it does because user loses fluidity).
    :return: the chosen parameters
    """
    # Create a st.form()
    with st.form("form"):
        st.markdown("""&lt;h6&gt;
            You can transcript a specific part of your audio by setting start and end values below (in seconds). Then, 
            choose your parameters.&lt;/h6&gt;""", unsafe_allow_html=True)

        # Possibility to trim / cut the audio on a specific part (=&gt; transcribe less seconds will result in saving time)
        # To perform that, user selects his time intervals thanks to sliders, displayed in 2 different columns
        col1, col2 = st.columns(2)
        with col1:
            start = st.slider("Start value (s)", 0, audio_length, value=0)
        with col2:
            end = st.slider("End value (s)", 0, audio_length, value=audio_length)

        # Create 3 new columns to displayed other options
        col1, col2, col3 = st.columns(3)

        # User selects his preferences with checkboxes
        with col1:
            # Get an automatic punctuation
            punctuation_token = st.checkbox("Punctuate my final text", value=True)

            # Differentiate Speakers
            if dia_pipeline == None:
                st.write("Diarization model unvailable")
                diarization_token = False
            else:
                diarization_token = st.checkbox("Differentiate speakers")

        with col2:
            # Summarize the transcript
            summarize_token = st.checkbox("Generate a summary", value=False)

            # Generate a SRT file instead of a TXT file (shorter timestamps)
            srt_token = st.checkbox("Generate subtitles file", value=False)

        with col3:
            # Display the timestamp of each transcribed part
            timestamps_token = st.checkbox("Show timestamps", value=True)

            # Improve transcript with an other model (better transcript but longer to obtain)
            choose_better_model = st.checkbox("Change STT Model")

        # Srt option requires timestamps so it can matches text with time =&gt; Need to correct the following case
        if not timestamps_token and srt_token:
            timestamps_token = True
            st.warning("Srt option requires timestamps. We activated it for you.")

        # Validate choices with a button
        transcript_btn = st.form_submit_button("Transcribe audio!")

    return transcript_btn, start, end, diarization_token, punctuation_token, timestamps_token, srt_token, summarize_token, choose_better_model</code></pre>



<p>This function is very simple to understand:</p>



<p>First of all, we display all options in a <em><strong>st.form()</strong></em>, so the page doesn&#8217;t reload each time the user interacts with an element (<em>Streamlit&#8217;s</em> feature which can be frustrating because in our case it wastes time). If you are curious, you can try to run your app with without the <em>st.form()</em> to observe the problem 😊.</p>



<p>Then, we create some <strong>columns</strong>. They allow us to display the elements one under the other, aligned, to improve the visual appearance. Here too, you can display the elements one after the other without using columns, but it will look different.</p>



<p>We will call this function in the <em>transcription()</em> function, in the next article 😉. But if you want to test it now, you can call this function after the <em>st.audio()</em> widget. Just keep in mind that this will only display the options, but it won&#8217;t change the result since the options are not implemented yet.</p>



<p><strong>4.2 Session states variables</strong></p>



<p>To interact with these features, we need to initialize more session states variables (I swear these are the last ones 🙃):</p>



<pre class="wp-block-code"><code class="">### Add new initialization to our config() function

st.session_state['srt_token'] = 0  # Is subtitles parameter enabled or not
st.session_state['srt_txt'] = ""  # Save the transcript in a subtitles case to display it on the results page
st.session_state["summary"] = ""  # Save the summary of the transcript so we can display it on the results page
st.session_state["number_of_speakers"] = 0  # Save the number of speakers detected in the conversation (diarization)
st.session_state["chosen_mode"] = 0  # Save the mode chosen by the user (Diarization or not, timestamps or not)
st.session_state["btn_token_list"] = []  # List of tokens that indicates what options are activated to adapt the display on results page
st.session_state["my_HF_token"] = "ACCESS_TOKEN_GOES_HERE"  # User's Token that allows the use of the diarization model
st.session_state["disable"] = True  # Default appearance of the button to change your token
</code></pre>



<p>To quickly introduce you to their usefulness:</p>



<ul class="wp-block-list">
<li><em>srt_token: </em>Indicates if the user has activated or not the subtitles option in the form</li>



<li><em>srt_text</em>: Contains the transcript as a subtitles format (.SRT) in order to save it when we click a button</li>



<li><em>summary</em>: Contains the short transcript given by the summarization model, for the same reason</li>



<li><em>number_of_speakers</em>: Number of speakers detected by the diarization algorithm in the audio recording</li>



<li><em>chosen_mode</em>: Indicates what options the user has selected so we know which information should be displayed (timestamps? results of diarization?)</li>



<li><em>btn_token_list</em>: Handle which buttons should be displayed. You will understand why it is needed in the next article</li>



<li><em>my_HF_token</em>: Save the user&#8217;s token that allows the use of the diarization model</li>



<li><em>disable</em>: Boolean that allows to make the change user&#8217;s token button clickable or not (not clickable if token has not been added)</li>
</ul>



<p>You will also need to <strong>add the following line of code to the <em>init_transcription()</em></strong> function:</p>



<pre class="wp-block-code"><code class=""># Add this line to the init_transcription() function
update_session_state("summary", "")</code></pre>



<p>This will reset the summary for each new audio file transcribed.</p>



<p><strong>4.3 Import the models</strong></p>



<p>Of course, to interact with these functionalities, we need to <strong>load new A.I. models</strong>. </p>



<p>⚠️ Reminder: We have used each of them in the previous <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/natural-language-processing/speech-to-text/miniconda" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">notebook tutorials</a>. We will not re-explain their usefulness here.</p>



<p><strong>4.3.1 Create a token to access to the diarization model</strong></p>



<p>Since version 2 of <em>pyannote.audio</em> library, an <strong>access token</strong> has been implemented and is <strong>mandatory</strong> in order to use the diarization model (which allows speakers differentiation)</p>



<p>To create your access token, you will need to:</p>



<ul class="wp-block-list">
<li><a href="https://huggingface.co/join" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Create an <em>Hugging Face</em> account</a> and <strong>verify your email address</strong></li>



<li>Visit the <em><a href="http://hf.co/pyannote/speaker-diarization" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">speaker-diarization</a></em> page and the <em><a href="http://hf.co/pyannote/segmentation" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">segmentation</a></em> page, and <strong>accept user conditions </strong>on both pages (only if requested)</li>



<li>Visit the <a href="http://hf.co/settings/tokens" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">token page</a> to <strong>create</strong> an access token (Read Role)</li>
</ul>



<p><strong>4.3.2 Load the models</strong></p>



<p>Once you have your token, you can <strong>modify the code</strong> of the <em>load_models()</em> function to <strong>add the models</strong>:</p>



<pre class="wp-block-code"><code class="">@st.cache(allow_output_mutation=True)
def load_models():
    """
    Instead of systematically downloading each time the models we use (transcript model, summarizer, speaker differentiation, ...)
    thanks to transformers' pipeline, we first try to directly import them locally to save time when the app is launched.
    This function has a st.cache(), because as the models never change, we want the function to execute only one time
    (also to save time). Otherwise, it would run every time we transcribe a new audio file.
    :return: Loaded models
    """

    # Load facebook-hubert-large-ls960-ft model (English speech to text model)
    with st.spinner("Loading Speech to Text Model"):
        # If models are stored in a folder, we import them. Otherwise, we import the models with their respective library

        try:
            stt_tokenizer = pickle.load(open("models/STT_processor_hubert-large-ls960-ft.sav", 'rb'))
        except FileNotFoundError:
            stt_tokenizer = Wav2Vec2Processor.from_pretrained("facebook/hubert-large-ls960-ft")

        try:
            stt_model = pickle.load(open("models/STT_model_hubert-large-ls960-ft.sav", 'rb'))
        except FileNotFoundError:
            stt_model = HubertForCTC.from_pretrained("facebook/hubert-large-ls960-ft")

    # Load T5 model (Auto punctuation model)
    with st.spinner("Loading Punctuation Model"):
        try:
            t5_tokenizer = torch.load("models/T5_tokenizer.sav")
        except OSError:
            t5_tokenizer = T5Tokenizer.from_pretrained("flexudy/t5-small-wav2vec2-grammar-fixer")

        try:
            t5_model = torch.load("models/T5_model.sav")
        except FileNotFoundError:
            t5_model = T5ForConditionalGeneration.from_pretrained("flexudy/t5-small-wav2vec2-grammar-fixer")

    # Load summarizer model
    with st.spinner("Loading Summarization Model"):
        try:
            summarizer = pickle.load(open("models/summarizer.sav", 'rb'))
        except FileNotFoundError:
            summarizer = pipeline("summarization")

    # Load Diarization model (Differentiate speakers)
    with st.spinner("Loading Diarization Model"):
        try:
            dia_pipeline = pickle.load(open("models/dia_pipeline.sav", 'rb'))
        except FileNotFoundError:
            dia_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.1", use_auth_token=st.session_state["my_HF_token"])

            #If the token hasn't been modified, dia_pipeline will automatically be set to None. The functionality will then be disabled.

    return stt_tokenizer, stt_model, t5_tokenizer, t5_model, summarizer, dia_pipeline</code></pre>



<p>⚠️ <strong>Don&#8217;t forget to modify the <em>use_auth_token</em></strong><em>=&#8221;ACCESS TOKEN GOES HERE&#8221;</em> by your <strong>personal token</strong>. Otherwise, the app will be launched without the diarization functionality.</p>



<p>As the <em>load_models()</em> function now returns 6 variables (instead of 2), we need to <strong>change the line of code that calls this function</strong> to avoid an error. This one is <strong>in the main</strong>:</p>



<pre class="wp-block-code"><code class=""># Replace the load_models() code line call in the main code by the following one
stt_tokenizer, stt_model, t5_tokenizer, t5_model, summarizer, dia_pipeline = load_models()</code></pre>



<p>As we discussed in the previous article, having <strong>more models makes the initialization </strong>of the speech to text app <strong>longer</strong>. You will notice this if you run the app.</p>



<p>➡️ This is why we now propose <strong>two ways to import the models</strong> in the previous function. </p>



<p>The first one, that we will use by default, consists in searching the models in a folder, where we will save all the models used by our app. If we find the model in this folder, we will import it, instead of downloading it. This allows not to depend on the download speed of an internet connection and makes the application usable as soon as it is launched! </p>



<p>If we don&#8217;t find the model in this folder, we will switch to the second solution which is to reproduce the way we have always used: <strong>download models from their libraries</strong>. The problem is that it takes several minutes to download all the models and then launch the application, which is quite frustrating.</p>



<p>➡️ We will show you how you can save these models in a folder in the documentation that will help you to deploy your project on AI Deploy.</p>



<h3 class="wp-block-heading">Conclusion</h3>



<p class="has-text-align-left">Well done 🥳 ! Your application is now visually pleasing and offers more interactivity thanks to the download button and those that allow you to play with the audio player. You also managed to create a form that will allow the user to indicate what functionalities he wants use!</p>



<p>➡️ Now it&#8217;s time to create these features and add them to our application! You can discover how in <a href="https://blog.ovhcloud.com/how-to-build-a-speech-to-text-application-with-python-3-3" target="_blank" rel="noreferrer noopener" data-wpel-link="internal">the next article</a> 😉.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-to-build-a-speech-to-text-application-with-python-2-3%2F&amp;action_name=How%20to%20build%20a%20Speech-To-Text%20Application%20with%20Python%20%282%2F3%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2022/11/speech_to_text_app_click_button_issue.mp4" length="4867513" type="video/mp4" />
<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2022/07/click_button_streamlit_solve-3.mov" length="6829193" type="video/quicktime" />
<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2022/07/click_timestamp_btn.mov" length="3781979" type="video/quicktime" />

			</item>
		<item>
		<title>How to build a Speech-To-Text application with Python (1/3)</title>
		<link>https://blog.ovhcloud.com/how-to-build-a-speech-to-text-application-with-python-1-3/</link>
		
		<dc:creator><![CDATA[Mathieu Busquet]]></dc:creator>
		<pubDate>Mon, 05 Dec 2022 09:06:15 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Apps]]></category>
		<category><![CDATA[AI Solutions]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[PyTorch]]></category>
		<category><![CDATA[Streamlit]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=23229</guid>

					<description><![CDATA[A tutorial to create and build your own Speech-To-Text application with Python. At the end of this first article, your Speech-To-Text application will be able to receive an audio recording and will generate its transcript! Final code of the app is available in our dedicated GitHub repository. Overview of our final app Overview of our [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-to-build-a-speech-to-text-application-with-python-1-3%2F&amp;action_name=How%20to%20build%20a%20Speech-To-Text%20application%20with%20Python%20%281%2F3%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p style="font-size:16px"><em>A tutorial to create and build your own <strong>Speech-To-Text application</strong></em> with <em>Python</em>.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="576" src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-1-1024x576.png" alt="speech to text app image1" class="wp-image-24058" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-1-1024x576.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-1-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-1-768x432.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-1-1536x864.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/speech-to-text-app-1.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>At the end of this first article, your Speech-To-Text application will be able to receive an audio recording and will generate its transcript! </p>



<p><em>Final code of the app is available in our dedicated <a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/speech-to-text" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a>.</em></p>



<h3 class="wp-block-heading">Overview of our final app</h3>



<figure class="wp-block-image aligncenter size-large is-style-default"><img loading="lazy" decoding="async" width="1024" height="575" src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview-1024x575.png" alt="Overview of our final Speech-To-Text application" class="wp-image-23277" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview-1024x575.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview-768x432.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview-1536x863.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/App_Overview.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p class="has-text-align-center"><em>Overview of our final Speech-To-Text application</em></p>



<h3 class="wp-block-heading">Objective</h3>



<p style="font-size:16px">In the previous <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/natural-language-processing/speech-to-text/conda" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">notebook </a><a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/natural-language-processing/speech-to-text/miniconda" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">tutorials</a>, we have seen how to <strong>translate speech into text</strong>, how to <strong>punctuate</strong> the transcript and <strong>summarize</strong> it. We have also seen how to <strong>distinguish speakers</strong> and how to generate <strong>video subtitles</strong>, all the while managing potential<strong> memory problems</strong>.</p>



<p>Now that we know how to do all this, let&#8217;s combine all these features together into a <strong>Speech-To-Text application</strong> using <em>Python</em>!</p>



<p>➡ To create this app, we will use <a href="https://streamlit.io/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external"><em>Streamlit</em></a>, a Python framework that turns scripts into a shareable web application. If you don&#8217;t know this tool, don&#8217;t worry, it is very simple to use.</p>



<p>This article is organized as follows:</p>



<ul class="wp-block-list">
<li>Import code from previous tutorials</li>



<li>Write the Streamlit App</li>



<li>Run your app!</li>
</ul>



<p>In the following articles, we will see how to implement the more advanced features (diarization, summarization, punctuation, …), and we will also learn how to build and use a custom <em>Docker</em> image for a <em>Streamlit</em> application, which will allow us to deploy our app on <a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/getting-started/" target="_blank" rel="noreferrer noopener" data-wpel-link="exclude">AI Deploy</a> !</p>



<p><em>⚠️ Since this article uses code already explained in the previous <em><a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/natural-language-processing/speech-to-text/miniconda" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">notebook tutorials</a></em>, </em>we will<em> not re-explain <em>its usefulness</em> here. We therefore recommend that you read the notebooks first.</em></p>



<h3 class="wp-block-heading">Import code from previous tutorials</h3>



<h4 class="wp-block-heading">1. Set up the environment </h4>



<p>To start, let&#8217;s create our <em>Python</em> environment. To do this, <strong>create</strong> a file named <em>requirements.txt</em> and <strong>add</strong> the following text to it. This will allow us to specify each version of the libraries required by our Speech to text project. </p>



<pre class="wp-block-code"><code class="">librosa==0.9.1
youtube_dl==2021.12.17
streamlit==1.9.0
transformers==4.18.0
httplib2==0.20.2
torch==1.11.0
torchaudio==0.11.0
sentencepiece==0.1.96
tokenizers==0.12.1
pyannote.audio==2.1.1
pyannote.core==4.4
pydub==0.25.1</code></pre>



<p>Then, you can <strong>install all these elements</strong> in only one command. To do so, you just have to <strong>open a terminal</strong> and <strong>enter the following command</strong>:</p>



<pre class="wp-block-code"><code class="">pip install -r requirements.txt</code></pre>



<h4 class="wp-block-heading">2. Import libraries</h4>



<p>Once your environment is ready, <strong>create</strong> a file named <em>app.py</em> and <strong>import the required libraries</strong> we used in the <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/natural-language-processing/speech-to-text/miniconda" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">notebooks</a>. </p>



<p>They will allow us to use artificial intelligence models, to manipulate audio files, times, …</p>



<pre class="wp-block-code"><code class=""># Models
import torch
from transformers import Wav2Vec2Processor, HubertForCTC

# Audio Manipulation
import audioread
import librosa
from pydub import AudioSegment, silence
import youtube_dl
from youtube_dl import DownloadError

# Others
from datetime import timedelta
import os
import streamlit as st
import time</code></pre>



<h4 class="wp-block-heading">3. Functions</h4>



<p>We also need to use some previous functions, you will probably recognize some of them. </p>



<p>⚠️ <em>Reminder:</em> <em>All this code has been explained in the <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/natural-language-processing/speech-to-text/miniconda" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">notebook tutorials</a>. </em>That&#8217;s why we will not re-explain its usefulness here.</p>



<p>To begin, let&#8217;s create the function that allows you to <strong>transcribe an audio chunk</strong>.</p>



<pre class="wp-block-code"><code class="">def transcribe_audio_part(filename, stt_model, stt_tokenizer, myaudio, sub_start, sub_end, index):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    try:
        with torch.no_grad():
            new_audio = myaudio[sub_start:sub_end]  # Works in milliseconds
            path = filename[:-3] + "audio_" + str(index) + ".mp3"
            new_audio.export(path)  # Exports to a mp3 file in the current path

            # Load audio file with librosa, set sound rate to 16000 Hz because the model we use was trained on 16000 Hz data
            input_audio, _ = librosa.load(path, sr=16000)

            # return PyTorch torch.Tensor instead of a list of python integers thanks to return_tensors = ‘pt’
            input_values = stt_tokenizer(input_audio, return_tensors="pt").to(device).input_values

            # Get logits from the data structure containing all the information returned by the model and get our prediction
            logits = stt_model.to(device)(input_values).logits
            prediction = torch.argmax(logits, dim=-1)
           
            # Decode &amp; lower our string (model's output is only uppercase)
            if isinstance(stt_tokenizer, Wav2Vec2Tokenizer):
                transcription = stt_tokenizer.batch_decode(prediction)[0]
            elif isinstance(stt_tokenizer, Wav2Vec2Processor):
                transcription = stt_tokenizer.decode(prediction[0])

            # return transcription
            return transcription.lower()

    except audioread.NoBackendError:
        # Means we have a chunk with a [value1 : value2] case with value1&gt;value2
        st.error("Sorry, seems we have a problem on our side. Please change start &amp; end values.")
        time.sleep(3)
        st.stop()</code></pre>



<p>Then, create the four functions that allow <em>silence detection method</em>, which we have explained in the <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/natural-language-processing/speech-to-text/miniconda/basics/speech-to-text-basics.ipynb" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">first notebook tutorial</a>.</p>



<p><strong>Get the timestamps of the silences</strong></p>



<pre class="wp-block-code"><code class="">def detect_silences(audio):

    # Get Decibels (dB) so silences detection depends on the audio instead of a fixed value
    dbfs = audio.dBFS

    # Get silences timestamps &gt; 750ms
    silence_list = silence.detect_silence(audio, min_silence_len=750, silence_thresh=dbfs-14)

    return silence_list</code></pre>



<p><strong>Get the middle value of each timestamp</strong></p>



<pre class="wp-block-code"><code class="">def get_middle_silence_time(silence_list):

    length = len(silence_list)
    index = 0
    while index &lt; length:
        diff = (silence_list[index][1] - silence_list[index][0])
        if diff &lt; 3500:
            silence_list[index] = silence_list[index][0] + diff/2
            index += 1
        else:

            adapted_diff = 1500
            silence_list.insert(index+1, silence_list[index][1] - adapted_diff)
            silence_list[index] = silence_list[index][0] + adapted_diff

            length += 1
            index += 2

    return silence_list</code></pre>



<p><strong>Create a regular distribution</strong>, which merges the timestamps according to a <em>min_space</em> and a <em>max_space</em> value.</p>



<pre class="wp-block-code"><code class="">def silences_distribution(silence_list, min_space, max_space, start, end, srt_token=False):

    # If starts != 0, we need to adjust end value since silences detection is performed on the trimmed/cut audio
    # (and not on the original audio) (ex: trim audio from 20s to 2m will be 0s to 1m40 = 2m-20s)

    # Shift the end according to the start value
    end -= start
    start = 0
    end *= 1000

    # Step 1 - Add start value
    newsilence = [start]

    # Step 2 - Create a regular distribution between start and the first element of silence_list to don't have a gap &gt; max_space and run out of memory
    # example newsilence = [0] and silence_list starts with 100000 =&gt; It will create a massive gap [0, 100000]

    if silence_list[0] - max_space &gt; newsilence[0]:
        for i in range(int(newsilence[0]), int(silence_list[0]), max_space):  # int bc float can't be in a range loop
            value = i + max_space
            if value &lt; silence_list[0]:
                newsilence.append(value)

    # Step 3 - Create a regular distribution until the last value of the silence_list
    min_desired_value = newsilence[-1]
    max_desired_value = newsilence[-1]
    nb_values = len(silence_list)

    while nb_values != 0:
        max_desired_value += max_space

        # Get a window of the values greater than min_desired_value and lower than max_desired_value
        silence_window = list(filter(lambda x: min_desired_value &lt; x &lt;= max_desired_value, silence_list))

        if silence_window != []:
            # Get the nearest value we can to min_desired_value or max_desired_value depending on srt_token
            if srt_token:
                nearest_value = min(silence_window, key=lambda x: abs(x - min_desired_value))
                nb_values -= silence_window.index(nearest_value) + 1  # (index begins at 0, so we add 1)
            else:
                nearest_value = min(silence_window, key=lambda x: abs(x - max_desired_value))
                # Max value index = len of the list
                nb_values -= len(silence_window)

            # Append the nearest value to our list
            newsilence.append(nearest_value)

        # If silence_window is empty we add the max_space value to the last one to create an automatic cut and avoid multiple audio cutting
        else:
            newsilence.append(newsilence[-1] + max_space)

        min_desired_value = newsilence[-1]
        max_desired_value = newsilence[-1]

    # Step 4 - Add the final value (end)

    if end - newsilence[-1] &gt; min_space:
        # Gap &gt; Min Space
        if end - newsilence[-1] &lt; max_space:
            newsilence.append(end)
        else:
            # Gap too important between the last list value and the end value
            # We need to create automatic max_space cut till the end
            newsilence = generate_regular_split_till_end(newsilence, end, min_space, max_space)
    else:
        # Gap &lt; Min Space &lt;=&gt; Final value and last value of new silence are too close, need to merge
        if len(newsilence) &gt;= 2:
            if end - newsilence[-2] &lt;= max_space:
                # Replace if gap is not too important
                newsilence[-1] = end
            else:
                newsilence.append(end)

        else:
            if end - newsilence[-1] &lt;= max_space:
                # Replace if gap is not too important
                newsilence[-1] = end
            else:
                newsilence.append(end)

    return newsilence</code></pre>



<p><strong>Add automatic &#8220;time cuts&#8221;</strong> to the silence list till end value depending on <em>min_space</em> and <em>max_space</em> values:</p>



<pre class="wp-block-code"><code class="">def generate_regular_split_till_end(time_list, end, min_space, max_space):

    # In range loop can't handle float values so we convert to int
    int_last_value = int(time_list[-1])
    int_end = int(end)

    # Add maxspace to the last list value and add this value to the list
    for i in range(int_last_value, int_end, max_space):
        value = i + max_space
        if value &lt; end:
            time_list.append(value)

    # Fix last automatic cut
    # If small gap (ex: 395 000, with end = 400 000)
    if end - time_list[-1] &lt; min_space:
        time_list[-1] = end
    else:
        # If important gap (ex: 311 000 then 356 000, with end = 400 000, can't replace and then have 311k to 400k)
        time_list.append(end)
    return time_list</code></pre>



<p>Create a function to <strong>clean the directory</strong> where we save the sounds and the audio chunks, so we do not keep them after transcribing:</p>



<pre class="wp-block-code"><code class="">def clean_directory(path):

    for file in os.listdir(path):
        os.remove(os.path.join(path, file))</code></pre>



<h3 class="wp-block-heading">Write the Streamlit application code</h3>



<h4 class="wp-block-heading">1. Configuration of the application</h4>



<p>Now that we have the basics, we can create the function that allows to <strong>configure the app</strong>. It will give a <strong>title</strong> and an <strong>icon</strong> to our app, and will create a <em>data</em> <strong>directory</strong> so that the application can <strong>store sounds files</strong> in it. Here is the function:</p>



<pre class="wp-block-code"><code class="">def config():

    st.set_page_config(page_title="Speech to Text", page_icon="📝")
    
    # Create a data directory to store our audio files
    # Will not be executed with AI Deploy because it is indicated in the DockerFile of the app
    if not os.path.exists("../data"):
        os.makedirs("../data")
    
    # Display Text and CSS
    st.title("Speech to Text App 📝")

    st.subheader("You want to extract text from an audio/video? You are in the right place!")
</code></pre>



<p>As you can see, this <em>data</em> directory is located at the root of the parent directory (indicated by the ../ notation). It will <strong>only be created</strong> if the application is launched <strong>locally</strong> on your computer, since <em>AI Deploy</em> has this folder<strong> pre-created</strong>.</p>



<p>➡️ We recommend that you do <strong>not change the location of the <em>data</em> directory (../)</strong>. Indeed, this location makes it easy to juggle between running the application locally or on AI Deploy.</p>



<h4 class="wp-block-heading">2. Load the speech to text model</h4>



<p>Create the function that allows to <strong>load the speech to text model</strong>.</p>



<p>As we are starting out, we only import the transcription model for the moment. We will implement the other features in the following article 😉.</p>



<p>⚠️ <em>Here, the use case is English speech recognition, but you can do it in another language thanks to one of the many models available on the <a href="https://huggingface.co/models" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Hugging Face website</a></em>. In this case, just keep in mind that you won&#8217;t be able to combine it with some of the models we will use in the next article, since some of them only work on English transcripts.</p>



<pre class="wp-block-code"><code class="">@st.cache(allow_output_mutation=True)
def load_models():

    # Load Wav2Vec2 (Transcriber model)
    stt_model = HubertForCTC.from_pretrained("facebook/hubert-large-ls960-ft")
    stt_tokenizer = Wav2Vec2Processor.from_pretrained("facebook/hubert-large-ls960-ft")

    return stt_tokenizer, stt_model</code></pre>



<p>We use a <em>@st.cache(allow_output_mutation=True) </em>here. This tells <em>Streamlit</em> to run the function and <strong>stores the results in a local cache</strong>, so next time we call the function (app refreshment), <em>Streamlit</em> knows it can <strong>skip executing this function</strong>. Indeed, since we have already imported the model(s) one time (initialization of the app), we must not waste time to reload them each time we want to transcribe a new file. </p>



<p>However, <strong>downloading the model </strong>when initializing the application<strong> takes time</strong> since it depends on certain factors such as our Internet connection. For one model, this is not a problem because the download time is still quite fast. But with all the models we plan to load in the next article, this initialization time may be longer, which would be frustrating 😪. </p>



<p>➡️ That&#8217;s why we will propose a way to <strong>solve this problem</strong> in a <strong>next blog post</strong>.</p>



<h4 class="wp-block-heading">3. Get an audio file</h4>



<p>Once we have loaded the model, we <strong>need an audio file</strong> to use it 🎵!</p>



<p>For this we will realize two features. The first one will allow the user to <strong>import his own audio file</strong>. The second one will allow him to <strong>indicate a video URL</strong> for which he wants to obtain the transcript.</p>



<h4 class="wp-block-heading">3.1. Allow the user to upload a file (mp3/mp4/wav)</h4>



<p>Let the user <strong>upload his own audio file</strong> thanks to a <em>st.file_uploader()</em> widget:</p>



<pre class="wp-block-code"><code class="">def transcript_from_file(stt_tokenizer, stt_model):

    uploaded_file = st.file_uploader("Upload your file! It can be a .mp3, .mp4 or .wav", type=["mp3", "mp4", "wav"])

    if uploaded_file is not None:
        # get name and launch transcription function
        filename = uploaded_file.name
        transcription(stt_tokenizer, stt_model, filename, uploaded_file)</code></pre>



<p>As you can see, if the <em>uploaded_file</em> variable is not None, which means the user has uploaded an audio file, we launch the transcribe process by calling the <em>transcription() </em>function that we will soon create.</p>



<h4 class="wp-block-heading">3.2. Transcribe a video from YouTube</h4>



<p>Create the function that allows to <strong>download the audio from a valid YouTube link</strong>:</p>



<pre class="wp-block-code"><code class="">def extract_audio_from_yt_video(url):
    
    filename = "yt_download_" + url[-11:] + ".mp3"
    try:

        ydl_opts = {
            'format': 'bestaudio/best',
            'outtmpl': filename,
            'postprocessors': [{
                'key': 'FFmpegExtractAudio',
                'preferredcodec': 'mp3',
            }],
        }
        with st.spinner("We are extracting the audio from the video"):
            with youtube_dl.YoutubeDL(ydl_opts) as ydl:
                ydl.download([url])

    # Handle DownloadError: ERROR: unable to download video data: HTTP Error 403: Forbidden / happens sometimes
    except DownloadError:
        filename = None

    return filename</code></pre>



<p>⚠️ If you are not the administrator of your computer, this function may not work for local execution.</p>



<p>Then, we need to <strong>display an element that allows the user to indicate the URL</strong> they want to transcribe.</p>



<p>We can do it thanks to the<em> </em><strong><em>st.text_input() </em>widget</strong>. The user will be able to <strong>type in the URL</strong> of the video that interests him. Then, we make a <strong>quick verification</strong>: if the entered link seems correct (contains the pattern of a YouTube link : &#8220;youtu&#8221;), we try to extract the audio from the URL&#8217;s video, and then transcribe it.</p>



<p>This is what the following function does:</p>



<pre class="wp-block-code"><code class="">def transcript_from_url(stt_tokenizer, stt_model):
    
    url = st.text_input("Enter the YouTube video URL then press Enter to confirm!")
    
    # If link seems correct, we try to transcribe
    if "youtu" in url:
        filename = extract_audio_from_yt_video(url)
        if filename is not None:
            transcription(stt_tokenizer, stt_model, filename)
        else:
            st.error("We were unable to extract the audio. Please verify your link, retry or choose another video")</code></pre>



<h4 class="wp-block-heading">4. Transcribe the audio file</h4>



<p>Now, we have to write the functions that <strong>links</strong> the majority of those we have already defined.</p>



<p>To begin, we write the code of the <em><strong>init_transcription()</strong></em> function. It <strong>informs the user</strong> that the <strong>transcription</strong> of the audio file <strong>is starting</strong> and that it will transcribe the audio from <em>start</em> seconds to <em>end</em> seconds. For the moment, these values correspond to the temporal ends of the audio (0s and the audio length). So it is not really interesting, but it will be useful in the next episode 😌! </p>



<p>This function also initializes some variables. Among them, s<em>rt_text and save_results are variables that we will also use in the following article. </em>Do not worry about them for now.</p>



<pre class="wp-block-code"><code class="">def init_transcription(start, end):
    
    st.write("Transcription between", start, "and", end, "seconds in process.\n\n")
    txt_text = ""
    srt_text = ""
    save_result = []
    return txt_text, srt_text, save_result</code></pre>



<p>We have the functions that perform the <strong><em>silences detection</em> method</strong> and that <strong>transcribe an audio file</strong>. But now we need to <strong>link all these functions</strong>. The function <em><strong>transcription_non_diarization() </strong></em>will do it for us:</p>



<pre class="wp-block-code"><code class="">def transcription_non_diarization(filename, myaudio, start, end, srt_token, stt_model, stt_tokenizer, min_space, max_space, save_result, txt_text, srt_text):
    
    # get silences
    silence_list = detect_silences(myaudio)
    if silence_list != []:
        silence_list = get_middle_silence_time(silence_list)
        silence_list = silences_distribution(silence_list, min_space, max_space, start, end, srt_token)
    else:
        silence_list = generate_regular_split_till_end(silence_list, int(end), min_space, max_space)

    # Transcribe each audio chunk (from timestamp to timestamp) and display transcript
    for i in range(0, len(silence_list) - 1):
        sub_start = silence_list[i]
        sub_end = silence_list[i + 1]

        transcription = transcribe_audio_part(filename, stt_model, stt_tokenizer, myaudio, sub_start, sub_end, i)
        
        if transcription != "":
            save_result, txt_text, srt_text = display_transcription(transcription, save_result, txt_text, srt_text, sub_start, sub_end)

    return save_result, txt_text, srt_text</code></pre>



<p>You will notice that this function calls the <strong><em>display_transcription()</em> </strong>function, which displays the right elements according to the <strong>parameters chosen</strong> by the user.</p>



<p>For the moment, the <strong>display</strong> is<strong> basic</strong> since we have not yet added the <strong>user&#8217;s parameters</strong>. This is why we will modify this function in the next article, in order to be able to handle different display cases, depending on the selected parameters.</p>



<p>You can <strong>add it</strong> to your <em>app.py</em> file:</p>



<pre class="wp-block-code"><code class="">def display_transcription(transcription, save_result, txt_text, srt_text, sub_start, sub_end):

    temp_timestamps = str(timedelta(milliseconds=sub_start)).split(".")[0] + " --&gt; " + str(timedelta(milliseconds=sub_end)).split(".")[0] + "\n"        
    temp_list = [temp_timestamps, transcription, int(sub_start / 1000)]
    save_result.append(temp_list)
    st.write(temp_timestamps)    
    st.write(transcription + "\n\n")
    txt_text += transcription + " "  # So x seconds sentences are separated

    return save_result, txt_text, srt_text</code></pre>



<p>Once this is done, all you have to do is display all the elements and link them using the <strong><em>transcription()</em> </strong>function:</p>



<pre class="wp-block-code"><code class="">def transcription(stt_tokenizer, stt_model, filename, uploaded_file=None):

    # If the audio comes from the YouTube extracting mode, the audio is downloaded so the uploaded_file is
    # the same as the filename. We need to change the uploaded_file which is currently set to None
    if uploaded_file is None:
        uploaded_file = filename

    # Get audio length of the file(s)
    myaudio = AudioSegment.from_file(uploaded_file)
    audio_length = myaudio.duration_seconds
    
    # Display audio file
    st.audio(uploaded_file)

    # Is transcription possible
    if audio_length &gt; 0:
        
        # display a button so the user can launch the transcribe process
        transcript_btn = st.button("Transcribe")

        # if button is clicked
        if transcript_btn:

            # Transcribe process is running
            with st.spinner("We are transcribing your audio. Please wait"):

                # Init variables
                start = 0
                end = audio_length
                txt_text, srt_text, save_result = init_transcription(start, int(end))
                srt_token = False
                min_space = 25000
                max_space = 45000


                # Non Diarization Mode
                filename = "../data/" + filename
                
                # Transcribe process with Non Diarization Mode
                save_result, txt_text, srt_text = transcription_non_diarization(filename, myaudio, start, end, srt_token, stt_model, stt_tokenizer, min_space, max_space, save_result, txt_text, srt_text)

                # Delete files
                clean_directory("../data")  # clean folder that contains generated files

                # Display the final transcript
                if txt_text != "":
                    st.subheader("Final text is")
                    st.write(txt_text)

                else:
                    st.write("Transcription impossible, a problem occurred with your audio or your parameters, we apologize :(")

    else:
        st.error("Seems your audio is 0 s long, please change your file")
        time.sleep(3)
        st.stop()</code></pre>



<p>This huge function looks like our <strong>main block code</strong>. It almost <strong>gathers</strong> <strong>all the implemented functionalities</strong>. </p>



<p>First of all, it retrieves the length of the audio file and allows the user to play it with a <em>st.audio()</em>, a widget that displays an <strong>audio player</strong>. Then, if the audio length is greater than 0s and the user clicks on the &#8220;<em>Transcribe</em>&#8221; button, the transcription is launched.</p>



<p>The user knows that the <strong>code is running</strong> since all the script is placed in a <em>st.spinner()</em>, which is displayed as a <strong>loading spinner</strong> on the app. </p>



<p>In this code, we initialize some variables. For the moment, we set the <em>srt_token</em> to <em>False</em>, since we are not going to generate subtitles (we will do it in next tutorials as I mentioned). </p>



<p>Then, the location of the audio file is indicated (remember it is in our ../data directory). The transcription process is at that time really started as the function <em>transcription_non_diarization() </em>is called. The audio file is transcribed from chunk to chunk, and the transcript is displayed part by part, with the corresponding timestamps. </p>



<p>Once finished, we can <strong>clean up the directory </strong>where all the chunks are located, and the final text is displayed.</p>



<h4 class="wp-block-heading">5. Main</h4>



<p>All that remains is to <strong>define the main</strong>, global architecture of our application.</p>



<p>We just need to create a <em>st.radio</em>() button widget so the user can either <strong>choose</strong> to transcribe his own file by <strong>importing</strong> it, or an external file by <strong>entering the URL</strong> of a video. Depending on the radio button value, we launch the right function (transcript from URL or from file).</p>



<pre class="wp-block-code"><code class="">if __name__ == '__main__':
    config()
    choice = st.radio("Features", ["By a video URL", "By uploading a file"]) 

    stt_tokenizer, stt_model = load_models()
    if choice == "By a video URL":
        transcript_from_url(stt_tokenizer, stt_model)

    elif choice == "By uploading a file":
        transcript_from_file(stt_tokenizer, stt_model)
</code></pre>



<h3 class="wp-block-heading">Run your app!</h3>



<p>We can already <strong>try our program</strong>! Indeed, run your code and enter the following command in your terminal. The <em>Streamlit</em> application will open in a tab of your browser.</p>



<pre class="wp-block-code"><code class=""><code>streamlit run path_of_your_project/app.py</code></code></pre>



<p> </p>



<p>⚠️⚠️ If this is the first time you manipulate audio files on your computer, you <strong>may get some OSErrors</strong> about the <em>libsndfile</em>, <em>ffprobe</em> and <em>ffmpeg</em> libraries.</p>



<p>Don&#8217;t worry, you can easily <strong>fix these errors</strong> by installing them. The command will be <strong>different depending on the OS</strong> you are using. For example, on <em>Linux</em>, you can use <em>apt-get</em>:</p>



<pre class="wp-block-code"><code class="">sudo apt-get install libsndfile-dev<br>sudo apt-get install ffmpeg</code></pre>



<p>If you have <a href="https://anaconda.org/anaconda/conda" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external"><em>Conda</em></a> or <a href="https://docs.conda.io/en/latest/miniconda.html" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external"><em>Miniconda</em></a> installed on your OS, you can use:</p>



<pre class="wp-block-code"><code class="">conda install -c main ffmpeg</code></pre>



<p> </p>



<p>If the application launches without error, congratulations 👏 ! You are now able to choose a YouTube video or import your own audio file into the application and get its transcript!</p>



<p>😪 Unfortunately, local resources may not be powerful enough to get a transcript in just a few seconds, which is quite frustrating.</p>



<p>➡️ To <strong>save time</strong>, <em>you can run your app on <strong>GPUs</strong> thanks to </em><strong>AI Deploy</strong><em>. To do this, please refer to this <a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/tuto-streamlit-speech-to-text-app/" data-wpel-link="exclude">documentation</a> to boot it up.</em></p>



<p>You can see what we have built on the following video:</p>



<figure class="wp-block-video"><video height="720" style="aspect-ratio: 1280 / 720;" width="1280" controls src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/Speech_To_Text_demo_HD_part_1.mp4"></video></figure>



<p class="has-text-align-center"><em>Quick demonstration of our Speech-To-Text application after completing this first tutorial</em></p>



<h3 class="wp-block-heading">Conclusion</h3>



<p>Well done 🥳 ! You are now able to import your own audio file on the app and get your first transcript!</p>



<p>You could be satisfied with that, but <strong>we can do so much better</strong>!</p>



<p>Indeed, our <em>Speech-To-Text</em> application is still <strong>very basic</strong>. We need to <strong>implement new functions</strong> like <em>speakers differentiation</em>,<strong> </strong><em>transcripts summarization</em>,<strong> </strong>or<strong> </strong><em>punctuation</em>, and also other <strong>essential</strong> <strong>functionalities </strong>like the possibility to <em>trim/cut an audio</em>, to <em>download the transcript</em>, <em>interact with the timestamps</em>, <em>justify the text</em>, &#8230; </p>



<p>➡️ If you want to <strong>improve</strong> your <em>Streamlit</em> application, <a href="https://blog.ovhcloud.com/how-to-build-a-speech-to-text-application-with-python-2-3/" target="_blank" rel="noreferrer noopener" data-wpel-link="internal">follow the next article</a> 😉.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-to-build-a-speech-to-text-application-with-python-1-3%2F&amp;action_name=How%20to%20build%20a%20Speech-To-Text%20application%20with%20Python%20%281%2F3%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2022/07/Speech_To_Text_demo_HD_part_1.mp4" length="9372511" type="video/mp4" />

			</item>
		<item>
		<title>Deploy a custom Docker image for Data Science project – Streamlit app for EDA and interactive prediction (Part 2)</title>
		<link>https://blog.ovhcloud.com/deploy-a-custom-docker-image-for-data-science-project-streamlit-app-for-eda-and-interactive-prediction-part-2/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Tue, 11 Oct 2022 07:38:35 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Solutions]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[PyTorch]]></category>
		<category><![CDATA[Streamlit]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=23479</guid>

					<description><![CDATA[A guide to deploy a custom Docker image for a Streamlit app with AI Deploy. Welcome to the second article concerning custom Docker image deployment. If you haven&#8217;t read the previous one, you can read it on the following link. It was about Gradio and sketch recognition. When creating code for a Data Science project, [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeploy-a-custom-docker-image-for-data-science-project-streamlit-app-for-eda-and-interactive-prediction-part-2%2F&amp;action_name=Deploy%20a%20custom%20Docker%20image%20for%20Data%20Science%20project%20%E2%80%93%20Streamlit%20app%20for%20EDA%20and%20interactive%20prediction%20%28Part%202%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>A guide to deploy a custom Docker image for a <a href="https://streamlit.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Streamlit</a> app with <strong>AI Deploy</strong>.</em></p>



<figure class="wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-2 is-layout-flex wp-block-gallery-is-layout-flex">
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="817" data-id="23517" src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image3-1024x817.jpeg" alt="streamlit app for eda and interactive prediction" class="wp-image-23517" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image3-1024x817.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image3-300x239.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image3-768x613.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image3-1536x1225.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image3.jpeg 1620w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>
</figure>



<p><em>Welcome to the second article concerning <strong>custom Docker image deployment</strong>. If you haven&#8217;t read the previous one, you can read it on the following <a href="https://blog.ovhcloud.com/deploy-a-custom-docker-image-for-data-science-project-gradio-sketch-recognition-app-part-1/" data-wpel-link="internal">link</a>. It was about <a href="https://gradio.app/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gradio</a> and sketch recognition.</em></p>



<p>When creating code for a <strong>Data Science project</strong>, you probably want it to be as portable as possible. In other words, it can be run as many times as you like, even on different machines.</p>



<p>Unfortunately, it is often the case that a Data Science code works fine locally on a machine but gives errors during runtime. It can be due to different versions of libraries installed on the host machine.</p>



<p>To deal with this problem, you can use <a href="https://www.docker.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Docker</a>.</p>



<p><strong>The article is organized as follows:</strong></p>



<ul class="wp-block-list">
<li>Objectives</li>



<li>Concepts</li>



<li>Load the trained PyTorch model </li>



<li>Build the Streamlit app with Python</li>



<li>Containerize your app with Docker</li>



<li>Launch the app with AI Deploy</li>
</ul>



<p><em>All the code for this blogpost is available in our dedicated <a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/eda-classification-iris" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>. You can test it with OVHcloud <strong>AI Deploy</strong> tool, please refer to the <a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/tuto-streamlit-eda-iris/" data-wpel-link="exclude">documentation</a> to boot it up.</em></p>



<h2 class="wp-block-heading">Objectives</h2>



<p>In this article, you will learn how to develop Streamlit app for two Data Science tasks: Exploratory Data&nbsp;Analysis (<strong>EDA</strong>) and prediction based on ML model.</p>



<p>Once your app is up and running locally, it will be a matter of containerizing it, then deploying the custom Docker image with AI Deploy.</p>



<figure class="wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-3 is-layout-flex wp-block-gallery-is-layout-flex">
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="466" data-id="23521" src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image4-1024x466.jpeg" alt="objective of streamlit app deployment" class="wp-image-23521" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image4-1024x466.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image4-300x137.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image4-768x350.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image4-1536x700.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image4.jpeg 1620w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>
</figure>



<h2 class="wp-block-heading">Concepts</h2>



<p>In Artificial Intelligence, you probably hear about the famous use case of the <a href="https://archive.ics.uci.edu/ml/datasets/iris" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Iris dataset</a>. <strong>How about learning more about the iris dataset?</strong></p>



<h3 class="wp-block-heading">Iris dataset</h3>



<p><strong>Iris Flower Dataset</strong> is considered as the <em>Hello World</em> for Data Science. The <a href="https://en.wikipedia.org/wiki/Iris_flower_data_set" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Iris Flower Dataset</a> contains <strong>four features</strong> (length and width of sepals and petals) of <strong>50 samples</strong> of <strong>three species</strong> of Iris:</p>



<ul class="wp-block-list">
<li>Iris setosa</li>



<li>Iris virginica</li>



<li>Iris versicolor</li>
</ul>



<p>The dataset is in <code>csv</code> format and you can also find it directly as a <code>dataframe</code>. It contains five columns namely: </p>



<ul class="wp-block-list">
<li>Petal length</li>



<li>Petal width</li>



<li>Sepal length</li>



<li>Sepal width</li>



<li>Species type</li>
</ul>



<p>The objective of the models based on this dataset is to classify the three <strong>Iris species</strong>. The measurements of petals and sepals are used to create, for example, a <strong>linear discriminant model</strong> to classify species.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image0-1024x864.jpeg" alt="iris dataset" class="wp-image-23522" width="646" height="545" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image0-1024x864.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image0-300x253.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image0-768x648.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image0-1536x1297.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image0.jpeg 1592w" sizes="auto, (max-width: 646px) 100vw, 646px" /></figure>



<p>❗ <code><strong>A model to classify Iris species was trained in a previous tutorial, in notebook form, which you can find and test<a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/computer-vision/image-classification/tensorflow/weights-and-biases/notebook_Weights_and_Biases_MNIST.ipynb" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external"> </a><a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/getting-started/pytorch/notebook_classification_iris.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">here</a>.</strong></code></p>



<p>This model is registered in an OVHcloud&nbsp;<a href="https://docs.ovh.com/gb/en/publiccloud/ai/cli/data-cli/" data-wpel-link="exclude">Object Storage container</a>.</p>



<p>In this article, the first objective is to create an app for Exploratory Data&nbsp;Analysis (<strong>EDA</strong>). Then you will see how to obtain interactive prediction.</p>



<h3 class="wp-block-heading">EDA</h3>



<p><strong>What is EDA in Data Science?</strong></p>



<p><a href="https://en.wikipedia.org/wiki/Exploratory_data_analysis" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Exploratory Data Analysis</a> (<strong>EDA</strong>) is a technique to analyze data with visual techniques. In this way, you have detailed information about the statistical summary of the data. </p>



<p>In addition, <strong>EDA</strong> allows duplicate values, outliers to be dealt with, and also to see certain trends or patterns present in the dataset.</p>



<p>For Iris dataset, the aim is to observe the source data on visual graphs using the <strong>Streamlit</strong> tool.</p>



<h3 class="wp-block-heading">Streamlit</h3>



<p><strong>What is Streamlit?</strong></p>



<p><a href="https://streamlit.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Streamlit</a>&nbsp;allows you to transform data scripts into quickly shareable web applications using only the <strong>Python</strong> language. Moreover, this framework does not require front-end skills.</p>



<p>This is a time saver for the data scientist who wants to deploy an app around the world of data!</p>



<p>To make this app accessible, you need to containerize it using&nbsp;<strong>Docker</strong>.</p>



<h3 class="wp-block-heading">Docker</h3>



<p><a href="https://www.docker.com/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Docker</a>&nbsp;platform allows you to build, run and manage isolated applications. The principle is to build an application that contains not only the written code but also all the context to run the code: libraries and their versions for example</p>



<p>When you wrap your application with all its context, you build a Docker image, which can be saved in your local repository or in the Docker Hub.</p>



<p>To get started with Docker, please, check this&nbsp;<a href="https://www.docker.com/get-started" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">documentation</a>.</p>



<p>To build a Docker image, you will define 2 elements:</p>



<ul class="wp-block-list">
<li>the application code (<em>Streamlit app</em>)</li>



<li>the&nbsp;<a href="https://docs.docker.com/engine/reference/builder/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Dockerfile</a></li>
</ul>



<p>In the next steps, you will see how to develop the Python code for your app, but also how to write the Dockerfile.</p>



<p>Finally, you will see how to deploy your custom docker image with&nbsp;<strong>OVHcloud AI Deploy</strong>&nbsp;tool.</p>



<h3 class="wp-block-heading">AI Deploy</h3>



<p><strong>AI Deploy</strong>&nbsp;enables AI models and managed applications to be started via Docker containers.</p>



<p>To know more about AI Deploy, please refer to this&nbsp;<a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/getting-started/" data-wpel-link="exclude">documentation</a>.</p>



<h2 class="wp-block-heading">Load the trained PyTorch model </h2>



<p>❗ <strong><code>To develop an app that uses a machine learning model, you must first load the model in the correct format. For this tutorial, a PyTorch model is used and the Python file utils.py is used to load it</code></strong>.</p>



<p>The first step is to import the&nbsp;<strong>Python libraries</strong>&nbsp;needed to load a PyTorch model in the <code>utils.py</code> file.</p>



<pre class="wp-block-code"><code class="">import torch
import torch.nn as nn
import torch.nn.functional as F</code></pre>



<p>To load your <strong>PyTorch model</strong>, it is first necessary to define its model architecture by using the <code>Model</code> class defined previously in the part &#8220;<em>Step 2 &#8211; Define the neural network model</em>&#8221; of the <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/getting-started/pytorch/notebook_classification_iris.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">notebook</a>.</p>



<pre class="wp-block-code"><code class="">class Model(nn.Module):
    def __init__(self):

        super().__init__()
        self.layer1 = nn.Linear(in_features=4, out_features=16)
        self.layer2 = nn.Linear(in_features=16, out_features=12)
        self.output = nn.Linear(in_features=12, out_features=3)

    def forward(self, x):

        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = self.output(x)

        return x</code></pre>



<p>In a second step, you fill in the access path to the model. To save this model in <code>pth</code> format, refer to the part &#8220;<em>Step 6 &#8211; Save the model for future inference</em>&#8221; of the <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/getting-started/pytorch/notebook_classification_iris.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">notebook</a>.</p>



<pre class="wp-block-code"><code class="">path = "model_iris_classification.pth"</code></pre>



<p>Then, the <code>load_checkpoint</code> function is used to load the model&#8217;s checkpoint.</p>



<pre class="wp-block-code"><code class="">def load_checkpoint(path):

    model = Model()
    print("Model display: ", model)
    model.load_state_dict(torch.load(path))
    model.eval()

    return model</code></pre>



<p>Finally, the function <code>load_model</code> is used to load the model and to use it to obtain the result of the prediction.</p>



<pre class="wp-block-code"><code class="">def load_model(X_tensor):

    model = load_checkpoint(path)
    predict_out = model(X_tensor)
    _, predict_y = torch.max(predict_out, 1)

    return predict_out.squeeze().detach().numpy(), predict_y.item()</code></pre>



<p>Find out the full Python code <a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/eda-classification-iris/utils.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">here</a>.</p>



<p>Have you successfully loaded your model? Good job 🥳 !</p>



<p>Let&#8217;s go for the creation of the Streamlit app!</p>



<h2 class="wp-block-heading">Build the Streamlit app with Python </h2>



<p>❗ <code><strong>All the codes below are available in the <em>app.py</em> file. The key functions are explained in this article.<br>However, the "<em>main</em>" part of the <em>app.py</em> file is not described. You can find the complete Python code of the <em>app.py</em> file <a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/eda-classification-iris/app.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">here</a>.</strong></code></p>



<p>To begin, you can import dependencies for Streamlit app.</p>



<ul class="wp-block-list">
<li>Numpy</li>



<li>Pandas</li>



<li>Seaborn</li>



<li><code>load_model</code> function from utils.py</li>



<li>Torch</li>



<li>Streamlit</li>



<li>Scikit-Learn</li>



<li>Ploty</li>



<li>PIL</li>
</ul>



<pre class="wp-block-code"><code class="">import numpy as np
import pandas as pd
import seaborn as sns
from utils import load_model
import torch
import streamlit as st
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import plotly.graph_objects as go
import plotly.express as px
from PIL import Image</code></pre>



<p>Then, you must load source dataset of <strong>Iris flowers</strong> to be able to extract the characteristics and thus, visualize data. Scikit-Learn allows to load this dataset without having to download it!</p>



<p>Next, you can separate the dataset in an <strong>input dataframe</strong> and an <strong>output dataframe</strong>.</p>



<p>Finally, this <code>load_data</code> function is cached so that you don&#8217;t have to download again the dataset.</p>



<pre class="wp-block-code"><code class="">@st.cache
def load_data():
    dataset_iris = load_iris()
    df_inputs = pd.DataFrame(dataset_iris.data, columns=dataset_iris.feature_names)
    df_output = pd.DataFrame(dataset_iris.target, columns=['variety'])

    return df_inputs, df_output</code></pre>



<p>The creation of this Streamlit app is separated into two parts.</p>



<p>Firstly, you can look into the creation of the EDA part. Then you will see how to create an interactive prediction tool using the PyTorch model.</p>



<h3 class="wp-block-heading">EDA on Iris Dataset</h3>



<p>As a first step, you can look at the source dataset by displaying different graphs using the Python <strong>Seaborn</strong> library.</p>



<p><strong>Seaborn Pairplot</strong> allows you to get the relationship between each variable present in <strong>Pandas</strong> dataframe. </p>



<p><code>sns.pairplot</code> plots the graph in pairs of several features in a grid format.</p>



<pre class="wp-block-code"><code class="">@st.cache(allow_output_mutation=True)
def data_visualization(df_inputs, df_output):

    df = pd.concat([df_inputs, df_output['variety']], axis=1)
    eda = sns.pairplot(data=df, hue="variety", palette=['#0D0888', '#CB4779', '#F0F922'])

    return eda</code></pre>



<p>Later, this function will display the following graph thanks to a call in the &#8220;<code><em>main</em></code>&#8221; of <code><a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/eda-classification-iris/app.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">app.py</a></code> file.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-1024x956.png" alt="iris data visualization / eda with sns.pairplot" class="wp-image-23487" width="756" height="706" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-1024x956.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-300x280.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-768x717.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image.png 1460w" sizes="auto, (max-width: 756px) 100vw, 756px" /></figure>



<p>Here it can be seen that the&nbsp;<code><strong>setosa</strong> 0</code>&nbsp;variety is easily separated from the other two (<code><strong>versicolor</strong>&nbsp;1</code>&nbsp;and&nbsp;<code><strong>virginica</strong>&nbsp;2</code>).</p>



<p>Were you able to display your graph? Well done 🎉 !</p>



<p>So, let&#8217;s go to the <strong>interactive prediction</strong> tool 🔜 !</p>



<h3 class="wp-block-heading">Create an interactive prediction tool</h3>



<p>To create an interactive prediction tool, you will need several elements:</p>



<ul class="wp-block-list">
<li>Firstly, you need <strong>four sliders</strong> to play with the input parameters</li>



<li>Secondly, you have to create a function to display the <strong>Principal Component Analysis</strong> (<strong>PCA</strong>) graph to visualize the point corresponding to the output of the model</li>



<li>Thirdly, you can build a <strong>histogram</strong> representing the result of the prediction</li>



<li>Fourthly, you will have a function to <strong>display the image</strong> of the predicted Iris species</li>
</ul>



<p>Ready to go? Let&#8217;s start creating <strong>sliders</strong>!</p>



<h4 class="wp-block-heading">Create a sidebar with sliders for input data</h4>



<p>In order to facilitate the visual reading of the Streamlit app, sliders are added in a <strong>sidebar</strong>.</p>



<p>In this sidebar, four sliders are added so that users can choose the length and width of petals and sepals.</p>



<p><strong>How to create a slider?</strong> Well, nothing could be easier than with Streamlit!</p>



<p>You need to define the function <code>st.sidebar.slider()</code> to <strong>add a slider to the sidebar</strong>. Then you can specify arguments such as <strong>minimum</strong> and <strong>maximum</strong> values or the average value which will be the default value. Finally, you can specify the <strong>step</strong> of your slider.</p>



<p>❗ <code><strong>Here you can see the example for a single slider. Find the complete code of the other sliders on the GitHub repo <a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/eda-classification-iris/app.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">here</a>.</strong></code></p>



<pre class="wp-block-code"><code class="">def create_slider(df_inputs):

    sepal_length = st.sidebar.slider(
        label='Sepal Length',
        min_value=float(df_inputs['sepal length (cm)'].min()),
        max_value=float(df_inputs['sepal length (cm)'].max()),
        value=float(round(df_inputs['sepal length (cm)'].mean(), 1)),
        step=0.1)

    sepal_width = st.sidebar.slider(
        ...
        )

    petal_length = st.sidebar.slider(
        ...
        )

    petal_width = st.sidebar.slider(
        ...
        )

    return sepal_length, sepal_width, petal_length, petal_width</code></pre>



<p>Later, this function will be call in the &#8220;<code><em>main</em></code>&#8221; of the <code><a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/eda-classification-iris/app.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">app.py</a></code> file. Afterwards, you will see the following interface:</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-1.png" alt="Streamlit sidebar with sliders" class="wp-image-23493" width="625" height="665"/></figure>



<p>Thanks to these <strong>sliders</strong>, you can now obtain the result of the prediction in an interactive way by playing on <strong>one or more parameters</strong>.</p>



<h4 class="wp-block-heading">Display PCA graph</h4>



<p>Once your sliders are up and running, you can create a function to display the graph of the <strong>Principal Component Analysis</strong> (<strong>PCA</strong>).</p>



<p><strong>PCA</strong> is a technique that transforms <strong>high-dimensional</strong> data into <strong>lower dimensions</strong> while retaining as much information as possible.</p>



<p><strong>What about the Iris dataset?</strong> The aim is to be able to display the point resulting from the model prediction on a<strong> two-dimensional graph</strong>.</p>



<p>The <code>run_pca</code> function below displays the <strong>two-dimensional</strong> graph with iris of the source dataset.</p>



<pre class="wp-block-code"><code class="">@st.cache
def run_pca():

    pca = PCA(2)
    X = df_inputs.iloc[:, :4]
    X_pca = pca.fit_transform(X)
    df_pca = pd.DataFrame(pca.transform(X))
    df_pca.columns = ['PC1', 'PC2']
    df_pca = pd.concat([df_pca, df_output['variety']], axis=1)

    return pca, df_pca</code></pre>



<p>Thereafter, the black point corresponding to the result of the prediction is placed on the same graph in the &#8220;<code><em>main</em></code>&#8221; of the Python <a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/eda-classification-iris/app.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><code>app.py</code></a> file.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="700" height="450" src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/newplot-1-1.png" alt="Principal Component Analysis (PCA) Iris dataset" class="wp-image-23498" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/10/newplot-1-1.png 700w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/newplot-1-1-300x193.png 300w" sizes="auto, (max-width: 700px) 100vw, 700px" /></figure>



<p>With this method you were able to visualize your point in space. However, the numerical result of the prediction is not filled in.</p>



<p>Therefore, you can also display the results as a histogram.</p>



<h4 class="wp-block-heading">Return predictions histogram</h4>



<p>At the output of the neural network, the results can be <strong>positive or negative</strong> and the highest value corresponds to the iris species predicted by the model.</p>



<p>To create a histogram, negative values can be removed. To do this, the predictions with <strong>positive values</strong> are extracted and sent to a list before being transformed into a dataframe.</p>



<p>The negative values are all replaced by the null value.</p>



<p>To summarize, the <code>extract_positive_value</code> function can be translated into the following mathematical formula: <br><code>f(prediction) = max(0, prediction)</code></p>



<pre class="wp-block-code"><code class="">def extract_positive_value(prediction):

    prediction_positive = []
    for p in prediction:
        if p &lt; 0:
            p = 0
        prediction_positive.append(p)

    return pd.DataFrame({'Species': ['Setosa', 'Versicolor', 'Virginica'], 'Confidence': prediction_positive})</code></pre>



<p>This function is then called to build the histogram in the &#8220;<code><em>main</em></code>&#8221; of the Python <a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/eda-classification-iris/app.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><code>app.py</code></a> file. The library <code>plotly</code> allows to build this <strong>bar chart</strong> as follows.</p>



<pre class="wp-block-code"><code class="">fig = px.bar(extract_positive_value(prediction), x='Species', y='Confidence', width=400, height=400, color='Species', color_discrete_sequence=['#0D0888', '#CB4779', '#F0F922'])</code></pre>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/newplot-2.png" alt="Histogram prediction iris species" class="wp-image-23499" width="388" height="388" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/10/newplot-2.png 400w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/newplot-2-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/newplot-2-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/newplot-2-70x70.png 70w" sizes="auto, (max-width: 388px) 100vw, 388px" /></figure>



<h4 class="wp-block-heading">Show Iris species image</h4>



<p>The final step is to display the predicted iris image using a <strong>Streamlit button</strong>. Therefore, you can define the display_image function to select the correct image based on the prediction.</p>



<pre class="wp-block-code"><code class="">def display_img(species):

    list_img = ['setosa.png', 'versicolor.png', 'virginica.png']

    return Image.open(list_img[species])</code></pre>



<p>Finally, in the main Python code <code><a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/eda-classification-iris/app.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">app.py</a></code>, <code>st.image()</code> displays the image when the user requests it by pressing the &#8220;<code>Show flower image</code>&#8221; button.</p>



<pre class="wp-block-code"><code class="">if st.button('Show flower image'):
    st.image(display_img(species), width=300)
    st.write(df_pred.iloc[species, 0])</code></pre>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="347" height="327" src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-2.png" alt="Streamlit button and image displayed" class="wp-image-23500" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-2.png 347w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-2-300x283.png 300w" sizes="auto, (max-width: 347px) 100vw, 347px" /></figure>



<p><code><strong>❗ Again, you can find the full code <a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/eda-classification-iris/app.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">here</a></strong></code>.</p>



<p>Before deploying your Streamlit app, you can test it locally using the following command:</p>



<pre class="wp-block-code"><code class="">streamlit run app.py</code></pre>



<p>Then, you can test your app locally at the following address:&nbsp;<strong>http://localhost:8080/</strong></p>



<p>Your app works locally? Congratulations&nbsp;🎉 !</p>



<p>Now it’s time to move on to containerization!</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="883" height="975" src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-5.png" alt="overview streamlit app for eda and prediction on iris data" class="wp-image-23508" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-5.png 883w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-5-272x300.png 272w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image-5-768x848.png 768w" sizes="auto, (max-width: 883px) 100vw, 883px" /></figure>



<h2 class="wp-block-heading">Containerize your app with Docker</h2>



<p>First of all, you have to build the file that will contain the different Python modules to be installed with their corresponding version.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="545" src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image5-1024x545.jpeg" alt="docker image data science" class="wp-image-23518" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/10/image5-1024x545.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image5-300x160.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image5-768x409.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image5-1536x818.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/10/image5.jpeg 1591w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading">Create the requirements.txt file</h3>



<p>The&nbsp;<code><a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/eda-classification-iris/requirements.txt" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">requirements.txt</a></code>&nbsp;file will allow us to write all the modules needed to make our application work.</p>



<pre class="wp-block-code"><code class="">pandas==1.4.4
numpy==1.23.2
torch==1.12.1
streamlit==1.12.2
scikit-learn==1.1.2
plotly==5.10.0
Pillow==9.2.0
seaborn==0.12.0</code></pre>



<p>This file will be useful when writing the&nbsp;<code>Dockerfile</code>.</p>



<h3 class="wp-block-heading">Write the Dockerfile</h3>



<p>Your&nbsp;<code><a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/eda-classification-iris/Dockerfile" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Dockerfile</a></code>&nbsp;should start with the the&nbsp;<code>FROM</code>&nbsp;instruction indicating the parent image to use. In our case we choose to start from a classic Python image.</p>



<p>For this Streamlit app, you can use version&nbsp;<strong>3.8</strong>&nbsp;of Python.</p>



<pre class="wp-block-code"><code class="">FROM python:3.8</code></pre>



<p>Next, you have to to fill in the working directory and add all&nbsp;files into.</p>



<p><code><strong>❗&nbsp;Here you must be in the /workspace directory. This is the basic directory for launching an OVHcloud AI Deploy.</strong></code></p>



<pre class="wp-block-code"><code class="">WORKDIR /workspace
ADD . /workspace</code></pre>



<p>Install the&nbsp;<code>requirements.txt</code>&nbsp;file which contains your needed Python modules using a&nbsp;<code>pip install…</code>&nbsp;command:</p>



<pre class="wp-block-code"><code class="">RUN pip install -r requirements.txt</code></pre>



<p>Then, you can give correct access rights to OVHcloud user (<code>42420:42420</code>).</p>



<pre class="wp-block-code"><code class="">RUN chown -R 42420:42420 /workspace
ENV HOME=/workspace</code></pre>



<p>Finally, you have to define your default launching command to start the application.</p>



<pre class="wp-block-code"><code class="">CMD [ "streamlit", "run", "/workspace/app.py", "--server.address=0.0.0.0" ]</code></pre>



<p>Once your&nbsp;<code>Dockerfile</code>&nbsp;is defined, you will be able to build your custom docker image.</p>



<h3 class="wp-block-heading">Build the Docker image from the Dockerfile</h3>



<p>First, you can launch the following command from the&nbsp;<code>Dockerfile</code>&nbsp;directory to build your application image.</p>



<pre class="wp-block-code"><code class="">docker build . -t streamlit-eda-iris:latest</code></pre>



<p>⚠️&nbsp;<strong><code>The dot . argument indicates that your build context (place of the Dockerfile and other needed files) is the current directory.</code></strong></p>



<p>⚠️&nbsp;<code><strong>The -t argument allows you to choose the identifier to give to your image. Usually image identifiers are composed of a name and a version tag &lt;name&gt;:&lt;version&gt;. For this example we chose streamlit-eda-iris:latest.</strong></code></p>



<h3 class="wp-block-heading">Test it locally</h3>



<p>Now, you can run the following&nbsp;<strong>Docker command</strong>&nbsp;to launch your application locally on your computer.</p>



<pre class="wp-block-code"><code class="">docker run --rm -it -p 8501:8501 --user=42420:42420 <strong style="background-color: inherit;font-family: inherit;font-size: inherit">streamlit-eda-iris</strong><span style="background-color: inherit;font-family: inherit;font-size: 1rem;font-weight: inherit">:latest</span></code></pre>



<p>⚠️&nbsp;<code><strong>The -p 8501:8501 argument indicates that you want to execute a port redirection from the port 8501 of your local machine into the port 8501 of the Docker container.</strong></code></p>



<p>⚠️<code><strong>&nbsp;Don't forget the --user=42420:42420 argument if you want to simulate the exact same behaviour that will occur on AI Deploy. It executes the Docker container as the specific OVHcloud user (user 42420:42420).</strong></code></p>



<p>Once started, your application should be available on&nbsp;<strong>http://localhost:8080</strong>.<br><br>Your Docker image seems to work? Good job&nbsp;👍 !<br><br>It’s time to push it and deploy it!</p>



<h3 class="wp-block-heading">Push the image into the shared registry</h3>



<p>❗&nbsp;The shared registry of AI Deploy should only be used for testing purpose. Please consider attaching your own Docker registry. More information about this can be found&nbsp;<a href="https://docs.ovh.com/asia/en/publiccloud/ai/training/add-private-registry/" data-wpel-link="exclude">here</a>.</p>



<p>Then, you have to find the address of your&nbsp;<code>shared registry</code>&nbsp;by launching this command.</p>



<pre class="wp-block-code"><code class="">ovhai registry list</code></pre>



<p>Next, log in on the shared registry with your usual&nbsp;<code>OpenStack</code>&nbsp;credentials.</p>



<pre class="wp-block-code"><code class="">docker login -u &lt;user&gt; -p &lt;password&gt; &lt;shared-registry-address&gt;</code></pre>



<p>To finish, you need to push the created image into the shared registry.</p>



<pre class="wp-block-code"><code class="">docker tag streamlit-eda-iris:latest &lt;shared-registry-address&gt;/streamlit-eda-iris:latest
docker push &lt;shared-registry-address&gt;/streamlit-eda-iris:latest</code></pre>



<p>Once you have pushed your custom docker image into the shared registry, you are ready to launch your app 🚀 !</p>



<h2 class="wp-block-heading">Launch the AI Deploy app</h2>



<p>The following command starts a new job running your Streamlit application.</p>



<pre class="wp-block-code"><code class="">ovhai app run \
      --default-http-port 8501 \
      --cpu 12 \
      &lt;shared-registry-address&gt;/streamlit-eda-iris:latest</code></pre>



<h3 class="wp-block-heading">Choose the compute resources</h3>



<p>First, you can either choose the number of GPUs or CPUs for your app.</p>



<p><code><strong>--cpu 12</strong></code>&nbsp;indicates that we request 12 CPUs for that app.</p>



<p>If you want, you can also launch this app with one or more&nbsp;<strong>GPUs</strong>.</p>



<h3 class="wp-block-heading">Make the app public</h3>



<p>Finally, if you want your app to be accessible without the need to authenticate, specify it as follows.</p>



<p>Consider adding the&nbsp;<code><strong>--unsecure-http</strong></code>&nbsp;attribute if you want your application to be reachable without any authentication.</p>



<figure class="wp-block-video"><video height="998" style="aspect-ratio: 1917 / 998;" width="1917" controls src="https://blog.ovhcloud.com/wp-content/uploads/2022/10/Enregistrement-de-lécran-2022-10-05-à-11.52.19-1.mov"></video></figure>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Well done 🎉&nbsp;! You have learned how to build your&nbsp;<strong>own Docker image</strong>&nbsp;for a dedicated&nbsp;<strong>EDA and interactive prediction app</strong>!</p>



<p>You have also been able to deploy this app thanks to&nbsp;<strong>OVHcloud’s AI Deploy</strong>&nbsp;tool.</p>



<p><em>In a third article, you will see how it is possible to deploy a Data Science project with an API for&nbsp;Spam classification.</em></p>



<h3 class="wp-block-heading" id="want-to-find-out-more">Want to find out more?</h3>



<h5 class="wp-block-heading"><strong>Notebook</strong></h5>



<p>You want to access the notebook? Refer to the&nbsp;<a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/getting-started/pytorch/notebook_classification_iris.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.</p>



<h5 class="wp-block-heading"><strong>App</strong></h5>



<p>You want to access to the full code to create the Streamlit app? Refer to the&nbsp;<a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/eda-classification-iris" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a>.<br><br>To launch and test this app with&nbsp;<strong>AI Deploy</strong>, please refer to&nbsp;our&nbsp;<a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/tuto-streamlit-eda-iris/" data-wpel-link="exclude">documentation</a>.</p>



<h2 class="wp-block-heading">References</h2>



<ul class="wp-block-list">
<li><a href="https://towardsdatascience.com/how-to-run-a-data-science-project-in-a-docker-container-2ab1a3baa889" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">How to Run a Data Science Project in a Docker Container</a></li>



<li><a href="https://medium.com/geekculture/create-a-machine-learning-web-app-with-streamlit-f28c75f9f40f" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Create a Machine Learning Web App with Streamlit</a></li>
</ul>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeploy-a-custom-docker-image-for-data-science-project-streamlit-app-for-eda-and-interactive-prediction-part-2%2F&amp;action_name=Deploy%20a%20custom%20Docker%20image%20for%20Data%20Science%20project%20%E2%80%93%20Streamlit%20app%20for%20EDA%20and%20interactive%20prediction%20%28Part%202%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2022/10/Enregistrement-de-lécran-2022-10-05-à-11.52.19-1.mov" length="6370587" type="video/quicktime" />

			</item>
		<item>
		<title>Deploy a custom Docker image for Data Science project &#8211; Gradio sketch recognition app (Part 1)</title>
		<link>https://blog.ovhcloud.com/deploy-a-custom-docker-image-for-data-science-project-gradio-sketch-recognition-app-part-1/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Tue, 20 Sep 2022 14:30:11 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Apps]]></category>
		<category><![CDATA[AI Solutions]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[gradio]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<category><![CDATA[tensorflow]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=23056</guid>

					<description><![CDATA[A guide to deploy a custom Docker image for a Gradio app with AI Deploy. When creating code for a Data Science project, you probably want it to be as portable as possible. In other words, it can be run as many times as you like, even on different machines. Unfortunately, it is often the [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeploy-a-custom-docker-image-for-data-science-project-gradio-sketch-recognition-app-part-1%2F&amp;action_name=Deploy%20a%20custom%20Docker%20image%20for%20Data%20Science%20project%20%26%238211%3B%20Gradio%20sketch%20recognition%20app%20%28Part%201%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>A guide to deploy a custom Docker image for a <a href="https://gradio.app/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gradio</a><strong> </strong>app with <strong>AI Deploy</strong>. </em></p>



<figure class="wp-block-image alignfull size-large"><img loading="lazy" decoding="async" width="1024" height="814" src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-blog-2-1024x814.jpeg" alt="Deploy a custom Docker image for Data Science project - Gradio sketch recognition app (Part 1)" class="wp-image-23192" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-blog-2-1024x814.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-blog-2-300x238.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-blog-2-768x610.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-blog-2-1536x1221.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-blog-2.jpeg 1573w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>When creating code for a <strong>Data Science project</strong>, you probably want it to be as portable as possible. In other words, it can be run as many times as you like, even on different machines.</p>



<p>Unfortunately, it is often the case that a Data Science code works fine locally on a machine but gives errors during runtime. It can be due to different versions of libraries installed on the host machine.</p>



<p>To deal with this problem, you can use <a href="https://www.docker.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Docker</a>.</p>



<p><strong>The article is organized as follows:</strong></p>



<ul class="wp-block-list"><li>Objectives</li><li>Concepts</li><li>Build the Gradio app with Python</li><li>Containerize your app with Docker</li><li>Launch the app with AI Deploy</li></ul>



<p><em>All the code for this blogpost is available in our dedicated <a href="https://github.com/ovh/ai-training-examples/tree/main/apps/gradio/sketch-recognition" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>. You can test it with OVHcloud <strong>AI Deploy</strong> tool, please refer to the <a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/tuto-gradio-sketch-recognition/" data-wpel-link="exclude">documentation</a> to boot it up.</em></p>



<h2 class="wp-block-heading">Objectives</h2>



<p>In this article, you will learn how to develop your first Gradio sketch recognition app based on an existing ML model.</p>



<p>Once your app is up and running locally, it will be a matter of containerizing it, then deploying the custom Docker image with AI Deploy.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="513" src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-objectives-1024x513.jpeg" alt="Objectives to create a Gradio app" class="wp-image-23189" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-objectives-1024x513.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-objectives-300x150.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-objectives-768x385.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-objectives-1536x770.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-objectives.jpeg 1620w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading">Concepts</h2>



<p><strong>In Artificial Intelligence, you probably hear about Computer Vision, but do you know what it is?</strong></p>



<p>Computer Vision is a branch of AI that aims to enable computers to interpret visual data (images for example) to extract information.</p>



<p>There are different tasks in computer vision:</p>



<ul class="wp-block-list"><li>Image classification</li><li>Object detection</li><li>Instance Segmentation</li></ul>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-computer-vision-1024x576.jpeg" alt="Computer vision principle" class="wp-image-23126" width="848" height="477" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-computer-vision-1024x576.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-computer-vision-300x169.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-computer-vision-768x432.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-computer-vision-1536x864.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-computer-vision.jpeg 1620w" sizes="auto, (max-width: 848px) 100vw, 848px" /></figure>



<p>Today we are interested in <strong>image recognition</strong> and more specifically in <strong>sketch recognition</strong> using a dataset of handwritten digits.</p>



<h3 class="wp-block-heading">MNIST dataset</h3>



<p>MNIST is a dataset developed by <a href="https://en.wikipedia.org/wiki/Yann_LeCun" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Yann LeCun</a>, <a href="https://en.wikipedia.org/wiki/Corinna_Cortes" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Corinna Cortes</a> and <a href="https://chrisburges.net/bio/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Christopher Burges</a> to evaluate <strong>Machine Learning</strong> models for <strong>handwritten digits</strong> classification.</p>



<p>The dataset was constructed from a number of digitized document datasets available from the <strong>National Institute of Standards and Technology </strong>(NIST).</p>



<p>The images of numbers are <strong>digitized</strong>, <strong>normalized</strong> and <strong>centered</strong>. This allows the developer to focus on machine learning with very little data cleaning.</p>



<p>Each image is a square of <strong>28</strong> by <strong>28</strong> pixels. The dataset is split in two with <strong>60,000 images</strong> for model training and <strong>10,000 images</strong> for testing it.</p>



<p>This is a digit recognition task to recognize <strong>10 digits</strong>, from 0 to 9.</p>



<figure class="wp-block-image aligncenter size-full"><img loading="lazy" decoding="async" width="266" height="264" src="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-mnist.jpeg" alt="MNIST dataset" class="wp-image-23125" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-mnist.jpeg 266w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-mnist-150x150.jpeg 150w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-mnist-70x70.jpeg 70w" sizes="auto, (max-width: 266px) 100vw, 266px" /></figure>



<p>❗<code><strong>A model to classify images of handwritten figures was trained in a previous tutorial, in notebook form, which you can find and test <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/computer-vision/image-classification/tensorflow/weights-and-biases/notebook_Weights_and_Biases_MNIST.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">here</a>.</strong></code></p>



<p>This model is registered in an OVHcloud <a href="https://docs.ovh.com/gb/en/publiccloud/ai/cli/data-cli/" data-wpel-link="exclude">Object Storage container</a>. </p>



<h3 class="wp-block-heading">Sketch recognition</h3>



<p><strong>Have you ever heard of sketch recognition in AI?</strong></p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p><strong>Sketch recognition</strong>&nbsp;is the automated recognition of <strong>hand-drawn&nbsp;diagrams</strong>&nbsp;by a&nbsp;computer.&nbsp;Research in sketch recognition lies at the crossroads of&nbsp;<strong>artificial<a href="https://en.wikipedia.org/wiki/Artificial_intelligence" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> </a>intelligence</strong>&nbsp;and&nbsp;<strong>human–computer</strong> interaction. Recognition algorithms usually are&nbsp;gesture-based, appearance-based,&nbsp;geometry-based, or a combination thereof.</p><cite>Wikipedia</cite></blockquote>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-sketch-recognition-1-1024x673.jpeg" alt="AI for Sketch Recognition " class="wp-image-23138" width="648" height="424"/></figure>



<p>In this article, <strong>Gradio</strong> will allow you to create your first sketch recognition app.</p>



<h3 class="wp-block-heading">Gradio</h3>



<p><strong>What is Gradio?</strong></p>



<p><a href="https://gradio.app/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gradio</a> allows you to create and share <strong>Machine Learning apps</strong>.</p>



<p>It&#8217;s a quick way to demonstrate your Machine Learning model with a user-friendly web interface so that anyone can use it.</p>



<p>Gradio offers the ability to quickly create a <strong>sketch recognition interface</strong> by specifying &#8220;<code>sketchpad</code>&#8221; as an entry.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="303" src="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-interface-1024x303.png" alt="Gradio app drawing space " class="wp-image-23114" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-interface-1024x303.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-interface-300x89.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-interface-768x227.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-interface.png 1118w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To make this app accessible, you need to containerize it using <strong>Docker</strong>.</p>



<h3 class="wp-block-heading">Docker</h3>



<p><a href="https://www.docker.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Docker</a> platform allows you to build, run and manage isolated applications. The principle is to build an application that contains not only the written code but also all the context to run the code: libraries and their versions for example</p>



<p>When you wrap your application with all its context, you build a Docker image, which can be saved in your local repository or in the Docker Hub.</p>



<p>To get started with Docker, please, check this <a href="https://www.docker.com/get-started" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">documentation</a>.</p>



<p>To build a Docker image, you will define 2 elements:</p>



<ul class="wp-block-list"><li>the application code (<em>Grapio sketch recognition app</em>)</li><li>the <a href="https://docs.docker.com/engine/reference/builder/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Dockerfile</a></li></ul>



<p>In the next steps, you will see how to develop the Python code for your app, but also how to write the Dockerfile.</p>



<p>Finally, you will see how to deploy your custom docker image with <strong>OVHcloud AI Deploy</strong> tool.</p>



<h3 class="wp-block-heading">AI Deploy</h3>



<p><strong>AI Deploy</strong> enables AI models and managed applications to be started via Docker containers. </p>



<p>To know more about AI Deploy, please refer to this <a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/getting-started/" data-wpel-link="exclude">documentation</a>.</p>



<h2 class="wp-block-heading">Build the Gradio app with Python</h2>



<h3 class="wp-block-heading">Import Python dependencies </h3>



<p>The first step is to import the <strong>Python libraries</strong> needed to run the Gradio app.</p>



<ul class="wp-block-list"><li>Gradio</li><li>TensorFlow</li><li>OpenCV</li></ul>



<pre class="wp-block-code"><code class="">import gradio as gr
import tensorflow as tf
import cv2</code></pre>



<h3 class="wp-block-heading">Define fixed elements of the app</h3>



<p>With <strong>Gradio</strong>, it is possible to add a title to your app to give information on its purpose.</p>



<pre class="wp-block-code"><code class="">title = "Welcome on your first sketch recognition app!"</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="71" src="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-title-1024x71.png" alt="Gradio app title" class="wp-image-23109" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-title-1024x71.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-title-300x21.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-title-768x53.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-title.png 1118w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, you can be describe your app by adding an image and a &#8220;<strong>description</strong>&#8220;.<br><br>To display and centre an image or text, an <strong>HTML tag</strong> is ideal 💡!</p>



<pre class="wp-block-code"><code class="">head = (
  "&lt;center&gt;"
  "&lt;img src='file/mnist-classes.png' width=400&gt;"
  "The robot was trained to classify numbers (from 0 to 9). To test it, write your number in the space provided."
  "&lt;/center&gt;"
)</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="235" src="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-description-1024x235.png" alt="Gradio app description" class="wp-image-23111" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-description-1024x235.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-description-300x69.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-description-768x177.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-description.png 1118w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>It is also possible to share a useful link (source code, documentation, …). You can do it with the Gradio attribute named &#8220;<strong>article</strong>&#8220;.</p>



<pre class="wp-block-code"><code class="">ref = "Find the whole code [here](https://github.com/ovh/ai-training-examples/tree/main/apps/gradio/sketch-recognition)."</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="54" src="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-reference-1024x54.png" alt="Gradio app references" class="wp-image-23112" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-reference-1024x54.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-reference-300x16.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-reference-768x41.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-reference.png 1118w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>For this application, you have to set some variables.</p>



<ul class="wp-block-list"><li><strong>The images size</strong></li></ul>



<p>The image size is set to <strong>28</strong>. Indeed, the model input expects to have a <strong>28&#215;28</strong> image.</p>



<ul class="wp-block-list"><li><strong>The classes list</strong></li></ul>



<p>The classes list is composed of ten strings corresponding to the numbers 0 to 9 written in full.</p>



<pre class="wp-block-code"><code class="">img_size = 28
labels = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"]</code></pre>



<p>Once the image size has been set and the list of classes defined, the next step is to load the AI model.</p>



<h3 class="wp-block-heading">Load TensorFlow model</h3>



<p>This is a <strong>TensorFlow model</strong> saved and exported beforehand in <code>model.h5</code> format.</p>



<p>Indeed, Keras provides a basic saving format using the <strong>HDF5</strong> standard.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p><strong>Hierarchical Data Format</strong>&nbsp;(HDF) is a set of&nbsp;file formats&nbsp;(HDF4,&nbsp;<strong>HDF5</strong>) designed to store and organize large amounts of data.</p><cite>Wikipedia</cite></blockquote>



<p>In a previous notebook,  you have exported the model in an<a href="https://www.ovhcloud.com/fr/public-cloud/object-storage/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> OVHcloud Object Storage</a> container. If you want to test the notebook, please refer to the <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/computer-vision/image-classification/tensorflow/weights-and-biases/notebook_Weights_and_Biases_MNIST.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.</p>



<p><code>model<strong>.</strong>save('model/sketch_recognition_numbers_model.h5')</code></p>



<p>To load this model again and use it for inference, without having to re-train it, you have to use the <code>load_model</code> function from Keras.</p>



<pre class="wp-block-code"><code class="">model = tf.keras.models.load_model("model/sketch_recognition_numbers_model.h5")</code></pre>



<p>After defining the different parameters and loading the model, you can define the function that will predict what you have drawn.</p>



<h3 class="wp-block-heading">Define the prediction function</h3>



<p>This function consists of several steps.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-predict-2-1024x365.jpeg" alt="Gradio app return the results top 3" class="wp-image-23139" width="892" height="318" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-predict-2-1024x365.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-predict-2-300x107.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-predict-2-768x274.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-predict-2-1536x548.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-predict-2.jpeg 1620w" sizes="auto, (max-width: 892px) 100vw, 892px" /></figure>



<pre class="wp-block-code"><code class="">def predict(img):

  img = cv2.resize(img, (img_size, img_size))
  img = img.reshape(1, img_size, img_size, 1)

  preds = model.predict(img)[0]

  return {label: float(pred) for label, pred in zip(labels, preds)}


label = gr.outputs.Label(num_top_classes=3)</code></pre>



<h3 class="wp-block-heading">Launch Gradio interface</h3>



<p>Now you need to build the interface using a Python class, named <a href="https://gradio.app/docs/#interface" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Interface</a>, previously defined by Gradio.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>The <strong>Interface class</strong> is a high-level abstraction that allows you to create a web-based demo around a machine learning model or arbitrary Python function by specifying: <br>(1) the function <br>(2) the desired input components<br>(3) desired output components.</p><cite>Gradio</cite></blockquote>



<pre class="wp-block-code"><code class="">interface = gr.Interface(fn=predict, inputs="sketchpad", outputs=label, title=title, description=head, article=ref)</code></pre>



<p>Finally, you have to launch the Gradio app with &#8220;<strong>launch</strong>&#8221; method. It launches a simple web server that serves the demo.</p>



<pre class="wp-block-code"><code class="">interface.launch(server_name="0.0.0.0", server_port=8080)</code></pre>



<p>Then, you can test your app locally at the following address: <strong>http://localhost:8080/</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="663" src="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-overview-1024x663.png" alt="Global overview of Gradio app" class="wp-image-23113" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-overview-1024x663.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-overview-300x194.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-overview-768x497.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-overview.png 1118w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Your app works locally? Congratulations 🎉!<br><br>Now it&#8217;s time to move on to containerization!</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-docker-1024x615.jpeg" alt="Docker - Gradio sketch recognition app
" class="wp-image-23120" width="671" height="403" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-docker-1024x615.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-docker-300x180.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-docker-768x461.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/06/gradio-docker.jpeg 1168w" sizes="auto, (max-width: 671px) 100vw, 671px" /></figure>



<h2 class="wp-block-heading">Containerize your app with Docker</h2>



<p>First of all, you have to build the file that will contain the different Python modules to be installed with their corresponding version.</p>



<h3 class="wp-block-heading">Create the requirements.txt file</h3>



<p>The <code>requirements.txt</code> file will allow us to write all the modules needed to make our application work.</p>



<pre class="wp-block-code"><code class="">gradio==3.0.10
tensorflow==2.9.1
opencv-python-headless==4.6.0.66</code></pre>



<p>This file will be useful when writing the <code>Dockerfile</code>.</p>



<h3 class="wp-block-heading">Write the Dockerfile</h3>



<p>Your <code>Dockerfile</code> should start with the the <code>FROM</code> instruction indicating the parent image to use. In our case we choose to start from a classic Python image.<br><br>For this Gradio app, you can use version <strong>3.7</strong> of Python.</p>



<pre class="wp-block-code"><code class="">FROM python:3.7</code></pre>



<p>Next, you have to to fill in the working directory and add the <code>requirements.txt</code> file.</p>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:100%">
<div class="inherit-container-width wp-block-group is-layout-constrained wp-block-group-is-layout-constrained"><div class="wp-block-group__inner-container">
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:100%">
<div class="inherit-container-width wp-block-group is-layout-constrained wp-block-group-is-layout-constrained"><div class="wp-block-group__inner-container">
<p><code><strong>❗ Here you must be in the /workspace directory. This is the basic directory for launching an OVHcloud AI Deploy.</strong></code></p>
</div></div>
</div>
</div>
</div></div>
</div>
</div>



<pre class="wp-block-code"><code class="">WORKDIR /workspace
ADD requirements.txt /workspace/requirements.txt</code></pre>



<p>Install the <code>requirements.txt</code> file which contains your needed Python modules using a <code>pip install…</code> command:</p>



<pre class="wp-block-code"><code class="">RUN pip install -r requirements.txt</code></pre>



<p>Now, you have to add your <code>Python file</code>, as well as the image present in the description of your app, in the <code>workspace</code>.</p>



<pre class="wp-block-code"><code class="">ADD app.py mnist-classes.png /workspace/</code></pre>



<p>Then, you can give correct access rights to OVHcloud user (<code>42420:42420</code>).</p>



<pre class="wp-block-code"><code class="">RUN chown -R 42420:42420 /workspace
ENV HOME=/workspace</code></pre>



<p>Finally, you have to define your default launching command to start the application.</p>



<pre class="wp-block-code"><code class="">CMD [ "python3" , "/workspace/app.py" ]</code></pre>



<p>Once your <code>Dockerfile</code> is defined, you will be able to build your custom docker image.</p>



<h3 class="wp-block-heading">Build the Docker image from the Dockerfile</h3>



<p>First, you can launch the following command from the <code>Dockerfile</code> directory to build your application image.</p>



<pre class="wp-block-code"><code class="">docker build . -t gradio_app:latest</code></pre>



<p><code>⚠️</code> <strong><code>The dot . argument indicates that your build context (place of the Dockerfile and other needed files) is the current directory.</code></strong></p>



<p><code>⚠️</code> <code><strong>The -t argument allows you to choose the identifier to give to your image. Usually image identifiers are composed of a name and a version tag &lt;name&gt;:&lt;version&gt;. For this example we chose gradio_app:latest.</strong></code></p>



<h3 class="wp-block-heading">Test it locally</h3>



<p><code><strong>❗ If you are testing your app locally, you can download your model (sketch_recognition_numbers_model.h5), then add it to the /workspace</strong></code></p>



<p>You can do it via the Dockerfile with the following line:</p>



<p><code><strong>ADD sketch_recognition_numbers_model.h5 /workspace/</strong></code></p>



<p>Now, you can run the following <strong>Docker command</strong> to launch your application locally on your computer.</p>



<pre class="wp-block-code"><code class="">docker run --rm -it -p 8080:8080 --user=42420:42420 gradio_app:latest</code></pre>



<p><code>⚠️</code> <code><strong>The -p 8080:8080 argument indicates that you want to execute a port redirection from the port 8080 of your local machine into the port 8080 of the Docker container.</strong></code></p>



<p><code><strong>⚠️ Don't forget the --user=42420:42420 argument if you want to simulate the exact same behaviour that will occur on AI Deploy. It executes the Docker container as the specific OVHcloud user (user 42420:42420).</strong></code></p>



<p>Once started, your application should be available on <strong>http://localhost:8080</strong>.<br><br>Your Docker image seems to work? Good job 👍!<br><br>It&#8217;s time to push it and deploy it!</p>



<h3 class="wp-block-heading">Push the image into the shared registry</h3>



<p>❗ The shared registry of AI Deploy should only be used for testing purpose. Please consider attaching your own Docker registry. More information about this can be found <a href="https://docs.ovh.com/asia/en/publiccloud/ai/training/add-private-registry/" data-wpel-link="exclude">here</a>.</p>



<p>Then, you have to find the address of your <code>shared registry</code> by launching this command.</p>



<pre class="wp-block-code"><code class="">ovhai registry list</code></pre>



<p>Next, log in on the shared registry with your usual <code>OpenStack</code> credentials.</p>



<pre class="wp-block-code"><code class="">docker login -u &lt;user&gt; -p &lt;password&gt; &lt;shared-registry-address&gt;</code></pre>



<p>To finish, you need to push the created image into the shared registry.</p>



<pre class="wp-block-code"><code class="">docker tag gradio_app:latest &lt;shared-registry-address&gt;/gradio_app:latest
docker push &lt;shared-registry-address&gt;/gradio_app:latest</code></pre>



<p>Once you have pushed your custom docker image into the shared registry, you are ready to launch your app 🚀!</p>



<h2 class="wp-block-heading">Launch the AI Deploy</h2>



<p>The following command starts a new job running your Gradio application.</p>



<pre class="wp-block-code"><code class="">ovhai app run \
      --cpu 1 \
      --volume &lt;my_saved_model&gt;@&lt;region&gt;/:/workspace/model:RO \
      &lt;shared-registry-address&gt;/gradio_app:latest</code></pre>



<h3 class="wp-block-heading">Choose the compute resources</h3>



<p>First, you can either choose the number of GPUs or CPUs for your app.</p>



<p><code><strong>--cpu 1</strong></code> indicates that we request 1 CPU for that app.</p>



<p>If you want, you can also launch this app with one or more <strong>GPUs</strong>.</p>



<h3 class="wp-block-heading">Attach Object Storage container</h3>



<p>Then, you need to attach <strong>1 volume</strong> to this app. It contains the model that you trained before in part <em>&#8220;Save and export the model for future inference&#8221;</em> of the <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/computer-vision/image-classification/tensorflow/weights-and-biases/notebook_Weights_and_Biases_MNIST.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">notebook</a>.</p>



<p><code><strong>--volume &lt;my_saved_model&gt;@&lt;region&gt;/:/workspace/saved_model:RO</strong></code>&nbsp;is the volume attached for using your&nbsp;<strong>pretrained model</strong>. </p>



<p>This volume is read-only (<code>RO</code>) because you just need to use the model and not make any changes to this Object Storage container.</p>



<h3 class="wp-block-heading">Make the app public</h3>



<p>Finally, if you want your app to be accessible without the need to authenticate, specify it as follows.</p>



<p>Consider adding the&nbsp;<code><strong>--unsecure-http</strong></code>&nbsp;attribute if you want your application to be reachable without any authentication.</p>



<figure class="wp-block-video aligncenter"><video height="970" style="aspect-ratio: 1914 / 970;" width="1914" controls src="https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-video-final-app.mov"></video></figure>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Well done 🎉! You have learned how to build your <strong>own Docker image</strong> for a dedicated <strong>sketch recognition app</strong>! </p>



<p>You have also been able to deploy this app thanks to <strong>OVHcloud&#8217;s AI Deploy</strong> tool.</p>



<p><em>In a second article, you will see how it is possible to deploy a Data Science project for <strong>interactive data visualization&nbsp;and prediction</strong>.</em></p>



<h3 class="wp-block-heading" id="want-to-find-out-more">Want to find out more?</h3>



<h5 class="wp-block-heading"><strong>Notebook</strong></h5>



<p>You want to access the notebook? Refer to the <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/computer-vision/image-classification/tensorflow/weights-and-biases/notebook_Weights_and_Biases_MNIST.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.<br><br>To launch and test this notebook with&nbsp;<strong>AI Notebooks</strong>, please refer to our <a href="https://docs.ovh.com/gb/en/publiccloud/ai/notebooks/tuto-weights-and-biases/" data-wpel-link="exclude">documentation</a>.</p>



<h6 class="wp-block-heading"><strong>App</strong></h6>



<p>You want to access to the full code to create the Gradio app? Refer to the&nbsp;<a href="https://github.com/ovh/ai-training-examples/tree/main/apps/gradio/sketch-recognition" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a>.<br><br>To launch and test this app with&nbsp;<strong>AI Deploy</strong>, please refer to&nbsp;our <a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/tuto-gradio-sketch-recognition/" data-wpel-link="exclude">documentation</a>.</p>



<h2 class="wp-block-heading">References</h2>



<ul class="wp-block-list"><li><a href="https://towardsdatascience.com/how-to-run-a-data-science-project-in-a-docker-container-2ab1a3baa889" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">How to Run a Data Science Project in a Docker Container</a></li><li><a href="https://github.com/gradio-app/hub-sketch-recognition" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Sketch Recognition on Gradio</a></li></ul>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeploy-a-custom-docker-image-for-data-science-project-gradio-sketch-recognition-app-part-1%2F&amp;action_name=Deploy%20a%20custom%20Docker%20image%20for%20Data%20Science%20project%20%26%238211%3B%20Gradio%20sketch%20recognition%20app%20%28Part%201%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2022/07/gradio-video-final-app.mov" length="2582723" type="video/quicktime" />

			</item>
		<item>
		<title>Object detection: train YOLOv5 on a custom dataset</title>
		<link>https://blog.ovhcloud.com/object-detection-train-yolov5-on-a-custom-dataset/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Thu, 17 Mar 2022 15:21:22 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Notebook]]></category>
		<category><![CDATA[AI Solutions]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=21622</guid>

					<description><![CDATA[A guide to train a YOLO object detection algorithm on your dataset. It&#8217;s based on the YOLOv5 open source repository by&#160;Ultralytics. All the code for this blogpost is available in our dedicated GitHub repository. And you can test it in our AI Training, please refer to our documentation to boot it up. Introduction Computer Vision [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fobject-detection-train-yolov5-on-a-custom-dataset%2F&amp;action_name=Object%20detection%3A%20train%20YOLOv5%20on%20a%20custom%20dataset&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>A guide to train a <strong>YOLO object detection algorithm</strong>  on your dataset.</em> It&#8217;s based on the YOLOv5 open source repository by&nbsp;<a href="https://github.com/ultralytics/yolov5" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Ultralytics</a>.</p>



<p>All the code for this blogpost is available in our dedicated <a href="https://github.com/ovh/ai-training-examples" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a>. And you can test it in our <strong>AI Training</strong>, please refer to <a href="https://docs.ovh.com/us/en/publiccloud/ai/" target="_blank" rel="noreferrer noopener" data-wpel-link="exclude">our documentation</a> to boot it up.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0880-1024x537.jpeg" alt="Object detection: train YOLOv5 on a custom dataset" class="wp-image-22728" width="512" height="269" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0880-1024x537.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0880-300x157.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0880-768x403.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0880.jpeg 1200w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<h3 class="wp-block-heading" id="introduction">Introduction</h3>



<h4 class="wp-block-heading" id="computer-vision">Computer Vision</h4>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>&#8221;&nbsp;<em>Computer vision is a specific field that deals with how computers can gain high-level understanding from digital images or videos.</em>&nbsp;&#8220;</p><p>&#8221;&nbsp;<em>From the perspective of engineering, it seeks to understand an automate tasks that the human visual system can do.</em>&nbsp;&#8220;</p><cite>Wikipedia</cite></blockquote>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0872-1024x361.png" alt="Computer Vision" class="wp-image-22710" width="768" height="271" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0872-1024x361.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0872-300x106.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0872-768x271.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0872.png 1409w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<p><strong>The use cases are numerous &#8230;</strong></p>



<ul class="wp-block-list"><li>Automotive: autonomous car</li><li>Medical: cell detection</li><li>Retailing: automatic basket content detection</li><li>&#8230;</li></ul>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="993" height="743" src="https://blog.ovhcloud.com/wp-content/uploads/2022/05/image-yolov5.png" alt="" class="wp-image-22984" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/05/image-yolov5.png 993w, https://blog.ovhcloud.com/wp-content/uploads/2022/05/image-yolov5-300x224.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/05/image-yolov5-768x575.png 768w" sizes="auto, (max-width: 993px) 100vw, 993px" /></figure></div>



<h4 class="wp-block-heading" id="object-detection">Object Detection</h4>



<p><strong>Object detection</strong> is a branch of computer vision that identifies and locates objects in an image or video stream. This technique allows objects to be labelled accurately. Object detection can be used to determine and count objects in a scene or to track their movement.</p>



<h3 class="wp-block-heading" id="objective">Objective</h3>



<p>The purpose of this article is to show how it is possible to train YOLOv5 to recognise objects. YOLOv5 is an object detection algorithm. Although closely related to image classification, object detection performs image classification on a more precise scale. Object detection locates and categorises features in images.</p>



<p>It is based on the YOLOv5 repository by&nbsp;<a href="https://github.com/ultralytics/yolov5" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Ultralytics</a>.</p>



<h4 class="wp-block-heading">Use case: <a href="https://cocodataset.org/#home" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">COCO dataset</a></h4>



<p>COCO is a large-scale object detection, segmentation, and also captioning dataset. It has several features:</p>



<ul class="wp-block-list"><li>Object segmentation</li><li>Recognition in context</li><li>Superpixel stuff segmentation</li><li>330K images</li><li>1.5 million object instances</li><li>80 object categories</li><li>91 stuff categories</li><li>5 captions per image</li><li>250 000 people with keypoints</li></ul>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><a href="https://cocodataset.org/#home" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0873.png" alt="COCO dataset" class="wp-image-22712" width="413" height="220" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0873.png 550w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0873-300x160.png 300w" sizes="auto, (max-width: 413px) 100vw, 413px" /></a></figure></div>



<p class="has-ast-global-color-5-color has-ast-global-color-0-background-color has-text-color has-background">⚠️ Next, we will see how to use and train our <strong>own dataset</strong> to train a YOLOv5 model. But for this tutorial, we will use the <strong>COCO datase</strong>t.</p>



<p style="font-size:12px"><em>OVHcloud disclaims to the fullest extent authorized by law all warranties, whether express or implied, including any implied warranties of title, non-infringement, quiet enjoyment, integration, merchantability or fitness for a particular purpose regarding the use of the COCO dataset in the context of this notebook. The user shall fully comply with the terms of use that appears on the database website (https://cocodataset.org/).</em></p>



<h3 class="wp-block-heading" id="create-your-own-dataset">Create your own dataset</h3>



<p>To train our own dataset, we can refer to the following steps:</p>



<ul class="wp-block-list"><li><strong>Collecte your training images</strong>: to get our object detector off the ground, we need to first collect training images.</li></ul>



<p class="has-ast-global-color-5-color has-ast-global-color-0-background-color has-text-color has-background">⚠️ You must pay attention to the format of the images in your dataset. Think of putting your images in <strong>.jpg</strong> format!</p>



<ul class="wp-block-list"><li><strong>Define the number of classes</strong>: we have to make sure that the number of objects in each class is uniformly distributed.</li></ul>



<ul class="wp-block-list"><li><strong>Annotation of your training images</strong>: to train our object detector, we need to supervise its training using bounding box annotations. We have to draw a box around each object we want the detector to see and label each box with the object class we want the detector to predict.<a href="http://localhost:8888/notebooks/notebook_object_detection_yolov5.ipynb#Example-of-the-COCO-dataset" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a></li></ul>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation-1024x635.png" alt="" class="wp-image-21645" width="667" height="414" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation-1024x635.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation-300x186.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation-768x477.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation-1536x953.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation.png 1713w" sizes="auto, (max-width: 667px) 100vw, 667px" /></figure></div>



<p>↪️  Labels should be written as follows:</p>



<ol class="wp-block-list"><li><em>num_label</em>: label (or class) number. If you have <em>n</em> classes, the label number will be between <em>0</em> and <em>n-1</em></li><li><em>X</em> and <em>Y</em>: correspond to the coordinates of the centre of the box</li><li><em>width</em>: width of the box</li><li><em>height</em>: height of the box</li></ol>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0878.png" alt="Label format" class="wp-image-22726" width="445" height="217" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0878.png 889w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0878-300x146.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0878-768x375.png 768w" sizes="auto, (max-width: 445px) 100vw, 445px" /></figure></div>



<p>If an image contains several labels, we write a line for each label in the same <strong>.txt</strong> file.</p>



<ul class="wp-block-list"><li><strong>Split your dataset</strong>: we choose how to disperse our data (for example, keep 80% data in the training set and 20% in the validation set).</li></ul>



<p class="has-ast-global-color-5-color has-ast-global-color-0-background-color has-text-color has-background">⚠️ Images and labels must have the same name.</p>



<p><em>Exemple:</em></p>



<p><code>data/train/images/img0.jpg     # image </code><br><code>data/train/labels/img0.txt     # label</code></p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0875.png" alt="Splitting the dataset" class="wp-image-22718" width="348" height="322" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0875.png 696w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0875-300x277.png 300w" sizes="auto, (max-width: 348px) 100vw, 348px" /></figure></div>



<ul class="wp-block-list"><li><strong>Set up files and directory structure</strong>: to train the YOLOv5 model, we need to add a <strong>.yaml</strong> file to describe the parameters of our dataset.</li></ul>



<p>We have to specify the train and validation files.</p>



<p class="has-ast-global-color-3-color has-text-color"><code>train: /workspace/data/train/images</code><br><code>val: /workspace/data/valid/images</code></p>



<p>After that, we define number and names of classes.</p>



<p class="has-ast-global-color-3-color has-text-color"><code>nc: 80</code><br><code>names: ['aeroplane', 'apple', 'backpack', 'banana', 'baseball bat', 'baseball glove', 'bear', 'bed', 'bench', 'bicycle', 'bird', 'boat', 'book', 'bottle', 'bowl', 'broccoli', 'bus', 'cake', 'car', 'carrot', 'cat', 'cell phone', 'chair', 'clock', 'cow', 'cup', 'diningtable', 'dog', 'donut', 'elephant', 'fire hydrant', 'fork', 'frisbee', 'giraffe', 'hair drier', 'handbag', 'horse', 'hot dog', 'keyboard', 'kite', 'knife', 'laptop', 'microwave', 'motorbike', 'mouse', 'orange', 'oven', 'parking meter', 'person', 'pizza', 'pottedplant', 'refrigerator', 'remote', 'sandwich', 'scissors', 'sheep', 'sink', 'skateboard', 'skis', 'snowboard', 'sofa', 'spoon', 'sports ball', 'stop sign', 'suitcase', 'surfboard', 'teddy bear', 'tennis racket', 'tie', 'toaster', 'toilet', 'toothbrush', 'traffic light', 'train', 'truck', 'tvmonitor', 'umbrella', 'vase', 'wine glass', 'zebra']</code></p>



<p>Let&#8217;s follow the different steps!</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="187" src="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0874-1024x187.png" alt="Object detection steps" class="wp-image-22716" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0874-1024x187.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0874-300x55.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0874-768x141.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0874.png 1350w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>



<h2 class="wp-block-heading" id="install-yolov5-dependences"><strong>Install YOLOv5 dependences</strong></h2>



<h3 class="wp-block-heading" id="1-clone-yolov5-repository">1. Clone YOLOv5 repository</h3>



<pre class="wp-block-code"><code class="">git clone https://github.com/ultralytics/yolov5 /workspace/yolov5</code></pre>



<h3 class="wp-block-heading" id="2-install-dependencies-as-necessary">2. Install dependencies as necessary</h3>



<p>Now, we have to go to the /<em>yolov5</em> folder and install the &#8220;<em>requirements.txt</em>&#8221; file containing all the necessary dependencies.</p>



<pre class="wp-block-code"><code class="">cd /workspace/yolov5</code></pre>



<p class="has-ast-global-color-5-color has-ast-global-color-0-background-color has-text-color has-background">⚠️&nbsp;Before installing the &#8220;<em>requirements.txt</em>&#8221; file, you have to&nbsp;<strong>modify</strong>&nbsp;it.</p>



<p>To access it, follow this path:<br><code>workspace </code>-&gt; <code>yolov5</code> -&gt; <code>requirements.txt</code><br><br>Then we have to replace the line <code>opencv-python&gt;=4.1.2</code> by <code>opencv-python--headless</code>.<br><br>Now we can save the &#8220;<em>requirements.txt</em>&#8221; file by selecting <code>File</code> in the Jupyter toolbar, then <code>Save File</code>.<br><br>Then, we can start the installation!</p>



<pre class="wp-block-code"><code class="">pip install -r requirements.txt</code></pre>



<p><strong>It&#8217;s almost over!</strong><br><br>The last step is to open a new terminal:<br><code>File</code> -&gt; <code>New</code> -&gt; <code>Terminal</code><br><br>Once in our new terminal, we run the following command: <code>pip uninstall setuptools</code><br><br>We confirm our action by selecting <code>Y</code>.<br><br>And finally, run the command: <code>pip install setuptools==59.5.0</code><br><br>The installations are now complete.</p>



<p class="has-ast-global-color-5-color has-ast-global-color-0-background-color has-text-color has-background">⚠️ <strong>Reboot the notebook kernel</strong> and follow the next steps!</p>



<h2 class="wp-block-heading" id="import-dependencies"><strong>Import dependencies</strong></h2>



<pre class="wp-block-code"><code class="">import torch
import yaml
from IPython.display import Image, clear_output
from utils.plots import plot_results</code></pre>



<p>We check GPU availability.</p>



<pre class="wp-block-code"><code class="">print('Setup complete. Using torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))</code></pre>



<h2 class="wp-block-heading" id="define-and-train-yolov5-model"><strong>Define and train YOLOv5 model</strong></h2>



<h3 class="wp-block-heading" id="1-define-yolov5-model">1. Define YOLOv5 model</h3>



<p>We go to the directory where the &#8220;<em>data.yaml</em>&#8221; file is located.</p>



<pre class="wp-block-code"><code class="">cd /workspace/data

with open("data.yaml", 'r') as stream:
    num_classes = str(yaml.safe_load(stream)['nc'])</code></pre>



<p>The model configuration used for the tutorial is <strong>YOLOv5s</strong>.</p>



<pre class="wp-block-code"><code class="">cat /workspace/yolov5/models/yolov5s.yaml</code></pre>



<p class="has-ast-global-color-3-color has-text-color" id="yolov5-by-ultralytics-gpl-3-0-license"># <code>YOLOv5 🚀 by Ultralytics, GPL-3.0 license</code><br><br><code># Parameters</code><br><code>nc: 80 # number of classes<br>depth_multiple: 0.33 # model depth multiple<br>width_multiple: 0.50 # layer channel multiple<br>anchors:<br>     - [10,13, 16,30, 33,23] # P3/8<br>     - [30,61, 62,45, 59,119] # P4/16<br>     - [116,90, 156,198, 373,326] # P5/32</code><br><br><code># YOLOv5 backbone<br>backbone:<br>   # [from, number, module, args]<br>   [[-1, 1, Focus, [64, 3]], # 0-P1/2<br>    [-1, 1, Conv, [128, 3, 2]], # 1-P2/4<br>    [-1, 3, C3, [128]],<br>    [-1, 1, Conv, [256, 3, 2]], # 3-P3/8<br>    [-1, 9, C3, [256]],<br>    [-1, 1, Conv, [512, 3, 2]], # 5-P4/16<br>    [-1, 9, C3, [512]],<br>    [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32<br>    [-1, 1, SPP, [1024, [5, 9, 13]]],<br>    [-1, 3, C3, [1024, False]], # 9<br>   ]</code><br><br><code># YOLOv5 head<br>head:</code><br>   <code>[[-1, 1, Conv, [512, 1, 1]],<br>    [-1, 1, nn.Upsample, [None, 2, 'nearest']],<br>    [[-1, 6], 1, Concat, [1]], # cat backbone P4<br>    [-1, 3, C3, [512, False]], # 13<br><br>    [-1, 1, Conv, [256, 1, 1]],<br>    [-1, 1, nn.Upsample, [None, 2, 'nearest']],<br>    [[-1, 4], 1, Concat, [1]], # cat backbone P3<br>    [-1, 3, C3, [256, False]], # 17 (P3/8-small)<br><br>    [-1, 1, Conv, [256, 3, 2]],<br>    [[-1, 14], 1, Concat, [1]], # cat head P4<br>    [-1, 3, C3, [512, False]], # 20 (P4/16-medium)<br><br>    [-1, 1, Conv, [512, 3, 2]],<br>    [[-1, 10], 1, Concat, [1]], # cat head P5<br>    [-1, 3, C3, [1024, False]], # 23 (P5/32-large)<br><br>    [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)<br>   ]</code></p>



<h3 class="wp-block-heading" id="2-run-yolov5-training">2. Run YOLOv5 training</h3>



<p><strong>Parameters definitions:</strong></p>



<ul class="wp-block-list"><li><em>img</em>: refers to the input images size.</li><li><em>batch</em>: refers to the batch size (number of training examples utilized in one iteration).</li><li><em>epochs</em>: refers to the number of training epochs. An epoch corresponds to one cycle through the full training dataset.</li><li><em>data</em>: refers to the path to the yaml file.</li><li><em>cfg</em>: define the model configuration.</li></ul>



<p>We will train YOLOv5s model on custom dataset for 100 epochs.</p>



<h2 class="wp-block-heading" id="evaluate-yolov5-performance-on-coco-dataset"><strong>Evaluate YOLOv5 performance on COCO dataset</strong></h2>



<pre class="wp-block-code"><code class="">Image(filename<strong>=</strong>'/workspace/yolov5/runs/train/yolov5s_results/results.png', width<strong>=</strong>1000)  
<em># view results.p</em>ng</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="512" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/image-1-1024x512.png" alt="" class="wp-image-21947" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/image-1-1024x512.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/image-1-300x150.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/image-1-768x384.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/image-1-1536x768.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/image-1-2048x1024.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading" id="graphs-and-functions-explanation"><strong>Graphs and functions explanation</strong></h2>



<p><strong>Loss functions:</strong></p>



<p><em>For the training set:</em></p>



<ul class="wp-block-list"><li>Box: loss due to a box prediction not exactly covering an object.</li><li>Objectness: loss due to a wrong box-object IoU&nbsp;<strong>[1]</strong>&nbsp;prediction.</li><li>Classification: loss due to deviations from predicting ‘1’ for the correct classes and ‘0’ for all the other classes for the object in that box.</li></ul>



<p><em>For the valid set (the same loss functions as for the training data):</em></p>



<ul class="wp-block-list"><li>val Box</li><li>val Objectness</li><li>val Classification</li></ul>



<p><strong>Precision &amp; Recall:</strong></p>



<ul class="wp-block-list"><li>Precision: measures how accurate are the predictions. It is the percentage of your correct predictions</li><li>Recall: measures how good it finds all the positives</li></ul>



<p><em>How to calculate Precision and Recall ?</em></p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="370" src="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0876-1024x370.png" alt="Precision &amp; Recall" class="wp-image-22720" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0876-1024x370.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0876-300x109.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0876-768x278.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0876-1536x556.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0876.png 2010w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>



<p><strong>Accuracy functions:</strong></p>



<p>mAP (mean Average Precision) compares the ground-truth bounding box to the detected box and returns a score. The higher the score, the more accurate the model is in its detections.</p>



<ul class="wp-block-list"><li>mAP@ 0.5：when IoU is set to 0.5, the AP&nbsp;<strong>[2]</strong>&nbsp;of all pictures of each category is calculated, and then all categories are averaged : mAP</li><li>mAP@ 0.5:0.95：represents the average mAP at different IoU thresholds (from 0.5 to 0.95 in steps of 0.05)</li></ul>



<p><strong>[1] IoU (Intersection over Union)</strong>&nbsp;= measures the overlap between two boundaries. It is used to measure how much the predicted boundary overlaps with the ground truth</p>



<p><em>How to calculate IoU ?</em></p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0877-1024x321.png" alt="How to calculate IoU ?" class="wp-image-22724" width="768" height="241" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0877-1024x321.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0877-300x94.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0877-768x241.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/IMG_0877.png 1110w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<p><strong>[2] AP (Average precision)</strong>&nbsp;= popular metric in measuring the accuracy of object detectors. It computes the average precision value for recall value over 0 to 1</p>



<h2 class="wp-block-heading" id="inference"><strong>Inference</strong></h2>



<h4 class="wp-block-heading" id="1-run-yolov5-inference-on-test-images">1. Run YOLOv5 inference on test images</h4>



<p>We can perform inference on the contents of the <strong>/data/images</strong> folder. Images can be adde of your choice in the same folder in order to perform tests.</p>



<p>First, our trained weights saved in the weights folder. We can use the best weights and print the test images list.</p>



<pre class="wp-block-code"><code class=""><strong>cd</strong> /workspace/yolov5/
python detect.py --weights runs/train/yolov5s_results/weights/best.pt --img 416 --conf 0.4 --source data/images --name yolov5s_results</code></pre>



<p class="has-ast-global-color-3-color has-text-color"><code>/workspace/yolov5 <br><strong>detect: </strong>weights=['runs/train/yolov5s_results/weights/best.pt'], source=data/images, imgsz=416, conf_thres=0.4, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=yolov5s_results, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False YOLOv5 🚀 v5.0-306-g4495e00 torch 1.8.1 CUDA:0 (Tesla V100S-PCIE-32GB, 32510.5MB) </code><br><br><code>Fusing layers...  </code><br><code>Model Summary: 308 layers, 21356877 parameters, 0 gradients, 51.3 GFLOPs </code></p>



<p>Then, we have the the classes number of occurrences present in the image.</p>



<p class="has-ast-global-color-3-color has-text-color"><code>image 1/3 /workspace/yolov5/data/images/dog_street.jpg: 416x416 1 bicycle, 1 dog, 5 persons, Done. (0.017s)</code></p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="1024" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_chien_velo-1024x1024.jpeg" alt="" class="wp-image-21951" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_chien_velo-1024x1024.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_chien_velo-300x300.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_chien_velo-150x150.jpeg 150w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_chien_velo-768x768.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_chien_velo-1536x1536.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_chien_velo-2048x2048.jpeg 2048w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_chien_velo-70x70.jpeg 70w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>



<p class="has-ast-global-color-3-color has-text-color" id="block-88704cd0-e791-491a-a914-51a4c46f73ba"><code>image 2/3 /workspace/yolov5/data/images/lunch_computer.jpg: 288x416 1 broccoli, 1 cell phone, 1 cup, 1 diningtable, 1 fork, 1 keyboard, 1 knife, Done. (0.021s)</code></p>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="1024" height="683" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_lunch_box.jpeg" alt="" class="wp-image-21952" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_lunch_box.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_lunch_box-300x200.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_lunch_box-768x512.jpeg 768w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>



<p class="has-ast-global-color-3-color has-text-color" id="block-5daf33a9-ac73-403f-b24a-10b310429476"><code>image 3/3 /workspace/yolov5/data/images/policeman_horse.jpg: 320x416 6 cars, 2 horses, 2 persons, 1 traffic light, Done. (0.020s)</code></p>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="638" height="450" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_chevaux_policier.jpeg" alt="" class="wp-image-21953" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_chevaux_policier.jpeg 638w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/test_chevaux_policier-300x212.jpeg 300w" sizes="auto, (max-width: 638px) 100vw, 638px" /></figure></div>



<p class="has-ast-global-color-3-color has-text-color"><code>Results saved to runs/detect/yolov5s_results<br>Done. (0.322s)</code></p>



<h3 class="wp-block-heading" id="2-export-trained-weights-for-future-inference">2. Export trained weights for future inference</h3>



<p>Our weights are saved after training our model over 100 epochs.<br><br>Two weight files exist:<br>&#8211; the best one: <strong>best.pt</strong><br>&#8211; the last one: <strong>last.pt</strong><br><br>We choose the <strong>best</strong> one and we will start by renaming it</p>



<pre class="wp-block-preformatted"><strong>cd</strong> /workspace/yolov5/runs/train/yolov5s_results/weights/
os<strong>.</strong>rename("best.pt","yolov5s_100epochs.pt")</pre>



<p>Then, we copy it in a new folder where we can put all the weights generated during your trainings.</p>



<pre class="wp-block-preformatted"><strong>cp</strong> /workspace/yolov5/runs/train/yolov5s_results/weights/yolov5s_100epochs.pt /workspace/models_train/yolov5s_100epochs.pt</pre>



<h2 class="wp-block-heading" id="conclusion"><strong>Conclusion</strong></h2>



<p>The accuracy of the model can be improved by increasing the number of epochs, but after a certain period we reach a threshold, so the value should be determined accordingly.<br><br>The accuracy obtained for the test set is&nbsp;<strong>93.71 %</strong>, which is a satisfactory result.</p>



<h3 class="wp-block-heading" id="want-to-find-out-more">Want to find out more?</h3>



<ul class="wp-block-list"><li><strong>Notebook</strong></li></ul>



<p>You want to access the notebook? Refer to the GitHub repository.<br><br>To launch and test this notebook with&nbsp;<strong>AI Notebooks</strong>, please refer to our documentation.</p>



<ul class="wp-block-list"><li><strong>App</strong></li></ul>



<p>You want to access the tutorial to create a simple app? Refer to the <a href="https://github.com/ovh/ai-training-examples" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a>.<br><br>To launch and test this app with&nbsp;<strong>AI Training</strong>, please refer to <a href="https://docs.ovh.com/us/en/publiccloud/ai/" target="_blank" rel="noreferrer noopener" data-wpel-link="exclude">our documentation</a>.</p>



<h2 class="wp-block-heading" id="references"><strong>References</strong></h2>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fobject-detection-train-yolov5-on-a-custom-dataset%2F&amp;action_name=Object%20detection%3A%20train%20YOLOv5%20on%20a%20custom%20dataset&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>AI Notebooks: analyze and classify sounds with AI</title>
		<link>https://blog.ovhcloud.com/ai-notebooks-analyze-and-classify-sounds-with-ai/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Fri, 04 Mar 2022 08:57:00 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Notebook]]></category>
		<category><![CDATA[AI Solutions]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[Machine learning]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=21594</guid>

					<description><![CDATA[A guide to analyze and classify marine mammal sounds. Since you&#8217;re reading a blog post from a technology company, I bet you&#8217;ve heard about AI, Machine and Deep Learning many times before. Audio or sound classification is a technique with multiple applications in the field of AI and data science. Use cases Acoustic data classification: [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fai-notebooks-analyze-and-classify-sounds-with-ai%2F&amp;action_name=AI%20Notebooks%3A%20analyze%20and%20classify%20sounds%20with%20AI&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>A guide to analyze and classify <strong>marine mammal sounds</strong>.</em></p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0834-1024x537.jpeg" alt="AI Notebooks: analyze and classify sounds with AI" class="wp-image-22610" width="512" height="269" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0834-1024x537.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0834-300x157.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0834-768x403.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0834.jpeg 1200w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<p>Since you&#8217;re reading a blog post from a technology company, I bet you&#8217;ve heard about AI, Machine and Deep Learning many times before.</p>



<p>Audio or sound classification is a technique with multiple applications in the field of AI and data science.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0835-1024x467.png" alt="AI Notebooks: analyze and classify sounds with AI" class="wp-image-22611" width="768" height="350" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0835-1024x467.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0835-300x137.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0835-768x350.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0835.png 1322w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<h3 class="wp-block-heading" id="Use-cases:">Use cases<a href="http://localhost:8888/notebooks/notebook-marine-sound-classification.ipynb#Use-cases:" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a></h3>



<ul class="wp-block-list"><li><strong>Acoustic data classification:</strong></li></ul>



<p>&#8211; identifies location<br>&#8211; differentiates environments<br>&#8211; has a role in ecosystem monitoring</p>



<ul class="wp-block-list"><li><strong>Environmental sound classification:</strong></li></ul>



<p>&#8211; recognition of urban sounds<br>&#8211; used in security system<br>&#8211; used in predictive maintenance<br>&#8211; used to differentiate animal sounds</p>



<ul class="wp-block-list"><li><strong>Music classification:</strong></li></ul>



<p>&#8211; classify music<br>&#8211; <em>key role in:</em> audio libraries organisation by genre, improvement of recommandation algorithms, discovery of trends, listener preferences through data analysis, &#8230;</p>



<ul class="wp-block-list"><li><strong>Natural language classification:</strong></li></ul>



<p>&#8211; human speech classification<br>&#8211; <em>common in:</em> chatbots, virtual assistants, tech-to-speech application, &#8230;</p>



<p>In this article we will look at the <strong>classification of marine mammal sounds</strong>.</p>



<h3 class="wp-block-heading" id="objective">Objective</h3>



<p>The purpose of this article is to explain how to train a model to classify audios using <em>AI Notebooks</em>.<br><br>In this tutorial, the sounds in the dataset are in <em>.wav</em> format. To be able to use them and obtain results, it is necessary to pre-process this data by following different steps.</p>



<ul class="wp-block-list" id="block-c53a8333-8cfa-4558-81f3-827e57035439"><li>Analyse one of these audio recordings</li><li>Transform each sound file into a <em>.csv</em> file</li><li>Train your model from the <em>.csv</em> file</li></ul>



<p><strong>USE CASE:</strong> <a href="https://www.kaggle.com/shreyj1729/best-of-watkins-marine-mammal-sound-database/version/3" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Best of Watkins Marine Mammal Sound Database</a></p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/02/Capture-décran-2022-02-01-à-14.40.36-1024x617.png" alt="" class="wp-image-21968" width="781" height="471" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/02/Capture-décran-2022-02-01-à-14.40.36-1024x617.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/02/Capture-décran-2022-02-01-à-14.40.36-300x181.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/02/Capture-décran-2022-02-01-à-14.40.36-768x463.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/02/Capture-décran-2022-02-01-à-14.40.36.png 1031w" sizes="auto, (max-width: 781px) 100vw, 781px" /></figure></div>



<p>This dataset is composed of <strong>55 different folders </strong>corresponding to the marine mammals. In each folder are stored several sound files of each animal.<br><br>You can get more information about this dataset on this <a href="https://cis.whoi.edu/science/B/whalesounds/index.cfm" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">website</a>.<br><br>The data distribution is as follows:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0837-1024x681.png" alt="The data distribution " class="wp-image-22615" width="512" height="341" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0837-1024x681.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0837-300x200.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0837-768x511.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0837.png 1114w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<p class="has-ast-global-color-5-color has-ast-global-color-0-background-color has-text-color has-background">⚠️ <em>For this example, we choose only the </em><strong>first 45 classes</strong><em> (or folders).</em></p>



<p>Let&#8217;s follow the different steps!</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="188" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-1024x188.png" alt="Data analysis and classification" class="wp-image-22613" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-1024x188.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-300x55.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-768x141.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-1536x282.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-2048x376.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading" id="audio-libraries">Audio libraries</h3>



<h4 class="wp-block-heading" id="1-loading-an-audio-file-with-librosa">1. Loading an audio file with Librosa</h4>



<p><em>Librosa</em> is a Python module for audio signal analysis. By using <em>Librosa</em>, you can extract key features from the audio samples such as Tempo, Chroma Energy Normalized, Mel-Freqency Cepstral Coefficients, Spectral Centroid, Spectral Contrast, Spectral Rolloff, and Zero Crossing Rate. If you want to know more about this library, refer to the <a href="https://librosa.org/doc/latest/index.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">documentation</a>.</p>



<pre class="wp-block-code"><code class="">import librosa
import librosa.display as lplt</code></pre>



<p>You can start by looking at your data by displaying different parameters using the <em>Librosa</em> library.<br><br>First, you can do a test on a file.</p>



<pre class="wp-block-code"><code class="">test_sound = "data/AtlanticSpottedDolphin/61025001.wav"</code></pre>



<p>Loads and decodes the audio.</p>



<pre class="wp-block-code"><code class="">data, sr = librosa.load(test_sound)
print(type(data), type(sr))</code></pre>



<p class="has-ast-global-color-3-color has-text-color"><code>&lt;class 'numpy.ndarray'&gt; &lt;class 'int'&gt;</code></p>



<pre class="wp-block-code"><code class="">librosa.load(test_sound ,sr = 45600)</code></pre>



<p class="has-ast-global-color-3-color has-text-color"><code>(array([-0.0739522 , -0.06588229, -0.06673266, ..., 0.03021295, 0.05592792, 0. ], dtype=float32), 45600)</code></p>



<h4 class="wp-block-heading" id="2-playing-audio-with-ipython-display-audio">2. Playing Audio with IPython.display.Audio</h4>



<p><a href="https://ipython.org/ipython-doc/stable/api/generated/IPython.display.html#IPython.display.Audio" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">IPython.display.Audio</a> advises you play audio directly in a Jupyter notebook.<br><br>Using <em>IPython.display.Audio</em> to play the audio.</p>



<pre class="wp-block-code"><code class="">import IPython

IPython.display.Audio(data, rate = sr)</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0838.png" alt=" Playing the audio" class="wp-image-22618" width="518" height="130" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0838.png 690w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0838-300x75.png 300w" sizes="auto, (max-width: 518px) 100vw, 518px" /></figure></div>



<h3 class="wp-block-heading" id="visualizing-audio">Visualizing Audio</h3>



<h4 class="wp-block-heading" id="1-waveforms">1. Waveforms</h4>



<p><strong>Waveforms</strong> are visual representations of sound as time on the x-axis and amplitude on the y-axis. They allow for quick analysis of audio data.<br><br>We can plot the audio array using <em>librosa.display.waveplot</em>.</p>



<pre class="wp-block-code"><code class="">plt.show(librosa.display.waveplot(data))</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/waveforms.png" alt="" class="wp-image-21601" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/waveforms.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/waveforms-300x199.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<h4 class="wp-block-heading" id="2-spectrograms">2. Spectrograms</h4>



<p>A <strong>spectrogram</strong> is a visual way of representing the intensity of a signal over time at various frequencies present in a particular waveform.</p>



<pre class="wp-block-code"><code class="">stft = librosa.stft(data)
plt.colorbar(librosa.display.specshow(stft, sr = sr, x_axis = 'time', y_axis = 'hz'))</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms1.png" alt="" class="wp-image-21602" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms1.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms1-300x199.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<pre class="wp-block-code"><code class="">stft_db = librosa.amplitude_to_db(abs(stft))
plt.colorbar(librosa.display.specshow(stft_db, sr = sr, x_axis = 'time', y_axis = 'hz'))</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms2.png" alt="" class="wp-image-21603" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms2.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms2-300x199.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<h4 class="wp-block-heading" id="3-spectral-rolloff">3. Spectral Rolloff</h4>



<p><strong>Spectral Rolloff</strong> is the frequency below which a specified percentage of the total spectral energy.<br><br><em>librosa.feature.spectral_rolloff</em> calculates the attenuation frequency for each frame of a signal.</p>



<pre class="wp-block-code"><code class="">spectral_rolloff = librosa.feature.spectral_rolloff(data + 0.01, sr = sr)[0]
plt.show(librosa.display.waveplot(data, sr = sr, alpha = 0.4))</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectral_rolloff.png" alt="" class="wp-image-21604" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectral_rolloff.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectral_rolloff-300x199.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<h4 class="wp-block-heading" id="4-chroma-feature">4. Chroma Feature</h4>



<p>This tool is perfect for analyzing musical features whose pitches can be meaningfully categorized and whose tuning is close to the equal temperament scale.</p>



<pre class="wp-block-code"><code class="">chroma = librosa.feature.chroma_stft(data, sr = sr)
lplt.specshow(chroma, sr = sr, x_axis = "time" ,y_axis = "chroma", cmap = "coolwarm")
plt.colorbar()
plt.title("Chroma Features")
plt.show()</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="280" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/chroma_feature.png" alt="" class="wp-image-21605" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/chroma_feature.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/chroma_feature-300x207.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<h4 class="wp-block-heading" id="5-zero-crossing-rate">5. Zero Crossing Rate</h4>



<p>A <strong>zero crossing</strong> occurs if successive samples have different algebraic signs.</p>



<ul class="wp-block-list"><li>The rate at which zero crossings occur is a simple measure of the frequency content of a signal.</li><li>The number of zero-crossings measures the number of times in a time interval that the amplitude of speech signals passes through a zero value.</li></ul>



<pre class="wp-block-code"><code class="">start = 1000
end = 1200
plt.plot(data[start:end])
plt.grid()</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="255" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/zero_crossing_rate.png" alt="" class="wp-image-21606" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/zero_crossing_rate.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/zero_crossing_rate-300x188.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<h2 class="wp-block-heading" id="data-preprocessing"><strong>Data preprocessing</strong></h2>



<h4 class="wp-block-heading" id="1-data-transformation">1. Data transformation</h4>



<p>To train your model, preprocessing of data is required. First of all, you have to convert the <em>.wav</em> into a <em>.csv</em> file.</p>



<ul class="wp-block-list"><li>Define columns name:</li></ul>



<pre class="wp-block-code"><code class="">header = "filename length chroma_stft_mean chroma_stft_var rms_mean rms_var spectral_centroid_mean spectral_centroid_var spectral_bandwidth_mean spectral_bandwidth_var rolloff_mean rolloff_var zero_crossing_rate_mean zero_crossing_rate_var harmony_mean harmony_var perceptr_mean perceptr_var tempo mfcc1_mean mfcc1_var mfcc2_mean mfcc2_var mfcc3_mean mfcc3_var mfcc4_mean mfcc4_var label".split()</code></pre>



<ul class="wp-block-list"><li>Create the <em>data.csv</em> file:</li></ul>



<pre class="wp-block-code"><code class="">import csv

file = open('data.csv', 'w', newline = '')
with file:
    writer = csv.writer(file)
    writer.writerow(header)</code></pre>



<ul class="wp-block-list"><li>Define character string of marine mammals (45):</li></ul>



<p>There are 45 different marine animals, or 45 classes.</p>



<pre class="wp-block-code"><code class="">marine_mammals = "AtlanticSpottedDolphin BeardedSeal Beluga_WhiteWhale BlueWhale BottlenoseDolphin Boutu_AmazonRiverDolphin BowheadWhale ClymeneDolphin Commerson'sDolphin CommonDolphin Dall'sPorpoise DuskyDolphin FalseKillerWhale Fin_FinbackWhale FinlessPorpoise Fraser'sDolphin Grampus_Risso'sDolphin GraySeal GrayWhale HarborPorpoise HarbourSeal HarpSeal Heaviside'sDolphin HoodedSeal HumpbackWhale IrawaddyDolphin JuanFernandezFurSeal KillerWhale LeopardSeal Long_FinnedPilotWhale LongBeaked(Pacific)CommonDolphin MelonHeadedWhale MinkeWhale Narwhal NewZealandFurSeal NorthernRightWhale PantropicalSpottedDolphin RibbonSeal RingedSeal RossSeal Rough_ToothedDolphin SeaOtter Short_Finned(Pacific)PilotWhale SouthernRightWhale SpermWhale".split()</code></pre>



<ul class="wp-block-list"><li>Transform each <em>.wav</em> file into a <em>.csv</em> row:</li></ul>



<pre class="wp-block-code"><code class="">for animal in marine_mammals:

  for filename in os.listdir(f"/workspace/data/{animal}/"):

    sound_name = f"/workspace/data/{animal}/{filename}"
    y, sr = librosa.load(sound_name, mono = True, duration = 30)
    chroma_stft = librosa.feature.chroma_stft(y = y, sr = sr)
    rmse = librosa.feature.rms(y = y)
    spec_cent = librosa.feature.spectral_centroid(y = y, sr = sr)
    spec_bw = librosa.feature.spectral_bandwidth(y = y, sr = sr)
    rolloff = librosa.feature.spectral_rolloff(y = y, sr = sr)
    zcr = librosa.feature.zero_crossing_rate(y)
    mfcc = librosa.feature.mfcc(y = y, sr = sr)
    to_append = f'{filename} {np.mean(chroma_stft)} {np.mean(rmse)} {np.mean(spec_cent)} {np.mean(spec_bw)} {np.mean(rolloff)} {np.mean(zcr)}'
    
    for e in mfcc:
        to_append += f' {np.mean(e)}'

    to_append += f' {animal}'
    file = open('data.csv', 'a', newline = '')
    
    with file:
        writer = csv.writer(file)
        writer.writerow(to_append.split())</code></pre>



<ul class="wp-block-list"><li>Display the <em>data.csv</em> file:</li></ul>



<pre class="wp-block-code"><code class="">df = pd.read_csv('data.csv')</code></pre>



<h4 class="wp-block-heading" id="2-features-extraction">2. Features extraction</h4>



<p>In the preprocessing of the data, <em>feature extraction</em> is necessary before running the training. The purpose is to define the <strong>inputs</strong> and <strong>outputs </strong>of the neural network.</p>



<ul class="wp-block-list"><li><strong>OUTPUT</strong> (y): last column which is the <strong><em>label</em></strong>.</li></ul>



<p>You cannot use text directly for training. You will encode these labels with the <strong>LabelEncoder()</strong> function of <em>sklearn.preprocessing</em>.<br><br>Before running a model, you need to convert this type of categorical text data into numerical data that the model can understand.</p>



<pre class="wp-block-code"><code class="">from sklearn.preprocessing import LabelEncoder

class_list = df.iloc[:,-1]
converter = LabelEncoder()
y = converter.fit_transform(class_list)
print("y: ", y)</code></pre>



<p class="has-ast-global-color-3-color has-text-color"><code>y : [ 0 0 0 ... 44 44 44]</code></p>



<ul class="wp-block-list"><li><strong>INPUTS</strong> (X): all other columns are input parameters to the neural network.</li></ul>



<p>Remove the first column which does not provide any information for the training (the filename) and the last one which corresponds to the output.</p>



<pre class="wp-block-code"><code class="">from sklearn.preprocessing import StandardScaler

fit = StandardScaler()
X = fit.fit_transform(np.array(df.iloc[:, 1:26], dtype=float))</code></pre>



<h4 class="wp-block-heading" id="3-split-dataset-for-training">3. Split dataset for training</h4>



<pre class="wp-block-code"><code class="">from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test <strong>=</strong> train_test_split(X, y, test_size <strong>=</strong> 0.2)</code></pre>



<h2 class="wp-block-heading" id="building-the-model"><strong>Building the model</strong></h2>



<p id="block-08360fbe-4253-417f-ab02-9a4ee8b0d753">The first step is to build the model and display the summary.<br><br>For the CNN model, all hidden layers use a <strong>ReLU</strong> activation function, the output layer a <strong>Softmax</strong> function and a <strong>Dropout </strong>is used to avoid overfitting.</p>



<pre class="wp-block-code"><code class="">import keras
import tensorflow as tf
from tensorflow.keras.models import Sequential

model <strong>=</strong> tf<strong>.</strong>keras<strong>.</strong>models<strong>.</strong>Sequential([
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dense(512, activation <strong>=</strong> 'relu', input_shape <strong>=</strong> (X_train<strong>.</strong>shape[1],)),
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dropout(0.2),
    
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dense(256, activation <strong>=</strong> 'relu'),
    keras<strong>.</strong>layers<strong>.</strong>Dropout(0.2),
    
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dense(128, activation <strong>=</strong> 'relu'),
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dropout(0.2),
    
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dense(64, activation <strong>=</strong> 'relu'),
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dropout(0.2),
    
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dense(45, activation <strong>=</strong> 'softmax'),
])

print(model.summary())</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="560" height="427" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/model_summary.png" alt="" class="wp-image-21598" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/model_summary.png 560w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/model_summary-300x229.png 300w" sizes="auto, (max-width: 560px) 100vw, 560px" /></figure></div>



<h2 class="wp-block-heading" id="model-training-and-evaluation"><strong>Model training and evaluation</strong></h2>



<p><strong>Adam</strong> optimizer is used to train the model over <em>100 epochs</em>. This choice was made because it allows us to obtain better results.<br><br>The loss is calculated with the <strong>sparse_categorical_crossentropy</strong> function.</p>



<pre class="wp-block-code"><code class=""><strong>def</strong> trainModel(model,epochs, optimizer):
    batch_size <strong>=</strong> 128
    model<strong>.</strong>compile(optimizer <strong>=</strong> optimizer, loss <strong>=</strong> 'sparse_categorical_crossentropy', metrics <strong>=</strong> 'accuracy')
    <strong>return</strong> model<strong>.</strong>fit(X_train, y_train, validation_data <strong>=</strong> (X_test, y_test), epochs <strong>=</strong> epochs, batch_size <strong>=</strong> batch_size)</code></pre>



<p>Now, launch the training!</p>



<pre class="wp-block-code"><code class="">model_history <strong>=</strong> trainModel(model <strong>=</strong> model, epochs <strong>=</strong> 100, optimizer <strong>=</strong> 'adam')</code></pre>



<ul class="wp-block-list"><li>Display <strong>loss</strong> curves</li></ul>



<pre class="wp-block-code"><code class="">loss_train_curve <strong>=</strong> model_history<strong>.</strong>history["loss"]
loss_val_curve <strong>=</strong> model_history<strong>.</strong>history["val_loss"]
plt<strong>.</strong>plot(loss_train_curve, label <strong>=</strong> "Train")
plt<strong>.</strong>plot(loss_val_curve, label <strong>=</strong> "Validation")
plt<strong>.</strong>legend(loc <strong>=</strong> 'upper right')
plt<strong>.</strong>title("Loss")
plt<strong>.</strong>show()</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="390" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/02/loss.png" alt="" class="wp-image-22523" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/02/loss.png 390w, https://blog.ovhcloud.com/wp-content/uploads/2022/02/loss-300x207.png 300w" sizes="auto, (max-width: 390px) 100vw, 390px" /></figure></div>



<ul class="wp-block-list"><li>Display <strong>accuracy</strong> curves</li></ul>



<pre class="wp-block-code"><code class="">acc_train_curve <strong>=</strong> model_history<strong>.</strong>history["accuracy"]
acc_val_curve <strong>=</strong> model_history<strong>.</strong>history["val_accuracy"]
plt<strong>.</strong>plot(acc_train_curve, label <strong>=</strong> "Train")
plt<strong>.</strong>plot(acc_val_curve, label <strong>=</strong> "Validation")
plt<strong>.</strong>legend(loc <strong>=</strong> 'lower right')
plt<strong>.</strong>title("Accuracy")
plt<strong>.</strong>show()</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="390" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/02/accuracy.png" alt="" class="wp-image-22524" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/02/accuracy.png 390w, https://blog.ovhcloud.com/wp-content/uploads/2022/02/accuracy-300x207.png 300w" sizes="auto, (max-width: 390px) 100vw, 390px" /></figure></div>



<pre class="wp-block-code"><code class="">test_loss, test_acc <strong>=</strong> model<strong>.</strong>evaluate(X_test, y_test, batch_size <strong>=</strong> 128)
print("The test loss is: ", test_loss)
print("The best accuracy is: ", test_acc<strong>*</strong>100)</code></pre>



<p class="has-ast-global-color-3-color has-text-color"><code>20/20 [==============================] - 0s 3ms/step - loss: 0.2854 - accuracy: 0.9371 </code><br><code>The test loss is: </code>0.24700121581554413<br><code>The best accuracy is: </code>93.71269345283508</p>



<h2 class="wp-block-heading"><strong>Save the model for future inference</strong></h2>



<h4 class="wp-block-heading">1. Save and store the model in an OVHcloud Object Container</h4>



<pre class="wp-block-code"><code class="">model.save('/workspace/model-marine-mammal-sounds/saved_model/my_model')</code></pre>



<p>You can check your model directory.</p>



<pre class="wp-block-code"><code class="">%ls /workspace/model-marine-mammal-sounds/saved_model</code></pre>



<p><strong><em>Saved_model</em></strong> contains an assets folder, saved_model.pb, and variables folder.</p>



<pre class="wp-block-code"><code class="">%ls /workspace/model-marine-mammal-sounds/saved_model/my_model</code></pre>



<p>Then, you are able to load this model.</p>



<pre class="wp-block-code"><code class="">model = tf.keras.models.load_model('/workspace/model-marine-mammal-sounds/saved_model/my_model')</code></pre>



<p><strong>Do you want to use this model in a Streamlit app?</strong> Refer to our <a href="https://github.com/ovh/ai-training-examples/tree/main/jobs/streamlit/marine_sounds_classification_app" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="509" src="https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00-1024x509.png" alt="" class="wp-image-22553" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00-1024x509.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00-300x149.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00-768x382.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00-1536x763.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00.png 1906w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption>Streamlit app overview</figcaption></figure></div>



<h3 class="wp-block-heading" id="conclusion">Conclusion</h3>



<p>The accuracy of the model can be improved by increasing the number of epochs, but after a certain period we reach a threshold, so the value should be determined accordingly.<br><br>The accuracy obtained for the test set is <strong>93.71 %</strong>, which is a satisfactory result.</p>



<h4 class="wp-block-heading" id="want-to-find-out-more">Want to find out more?</h4>



<p>If you want to access the notebook, refer to the <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/tensorflow/tuto/notebook-marine-sound-classification.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.<br><br>To launch and test this notebook with <strong>AI Notebooks</strong>, please refer to our <a href="https://docs.ovh.com/gb/en/publiccloud/ai/notebooks/" data-wpel-link="exclude">documentation</a>.</p>



<p>You can also look at this presentation done at a <a href="https://startup.ovhcloud.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud Startup Program</a> event at <a href="https://stationf.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Station F</a>:</p>


<div class="lazyblock-you-tube-gdpr-compliant-Z2oJhGR wp-block-lazyblock-you-tube-gdpr-compliant"><script type="module">
  import 'https://blog.ovhcloud.com/wp-content/assets/ovhcloud-gdrp-compliant-embedding-widgets/src/ovhcloud-gdrp-compliant-youtube.js';
</script>
      
      <ovhcloud-gdrp-compliant-youtube
          video="EN7XKmPpi78"
          debug></ovhcloud-gdrp-compliant-youtube>

</div>


<p class="has-text-align-center"><em><strong>I hope you have enjoyed this article. Try for yourself!</strong></em></p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0839-1024x386.png" alt="AI Notebooks: analyze and classify sounds with AI" class="wp-image-22620" width="768" height="290" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0839-1024x386.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0839-300x113.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0839-768x290.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0839.png 1219w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<h3 class="wp-block-heading" id="references">References</h3>



<p><a href="https://blog.clairvoyantsoft.com/music-genre-classification-using-cnn-ef9461553726" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://blog.clairvoyantsoft.com/music-genre-classification-using-cnn-ef9461553726</a></p>



<p><a href="https://towardsdatascience.com/music-genre-classification-with-python-c714d032f0d8" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://towardsdatascience.com/music-genre-classification-with-python-c714d032f0d8</a></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fai-notebooks-analyze-and-classify-sounds-with-ai%2F&amp;action_name=AI%20Notebooks%3A%20analyze%20and%20classify%20sounds%20with%20AI&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
