<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Audio Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/tag/audio/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.ovhcloud.com/tag/audio/</link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Fri, 06 Feb 2026 15:18:33 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>Audio Archives - OVHcloud Blog</title>
	<link>https://blog.ovhcloud.com/tag/audio/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Create a podcast transcript with Whisper by AI Endpoints</title>
		<link>https://blog.ovhcloud.com/create-a-podcast-transcript-with-whisper-by-ai-endpoints/</link>
		
		<dc:creator><![CDATA[Stéphane Philippart]]></dc:creator>
		<pubDate>Thu, 28 Aug 2025 07:03:04 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<category><![CDATA[Audio]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29389</guid>

					<description><![CDATA[Check out this blog post if you want to know more about AI Endpoints.You can also find more info on AI Endpoints in our previous blog posts. This blog post explains how to create a podcast transcript using Whisper, a powerful automatic speech recognition (ASR) system developed by OpenAI. Whisper integrates with AI Endpoints and [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fcreate-a-podcast-transcript-with-whisper-by-ai-endpoints%2F&amp;action_name=Create%20a%20podcast%20transcript%20with%20Whisper%20by%20AI%20Endpoints&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image aligncenter size-full is-resized"><img fetchpriority="high" decoding="async" width="1024" height="1024" src="https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02.png" alt="A robot listening a podcast" class="wp-image-29401" style="width:640px" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-768x768.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-70x70.png 70w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>Check out this<a href="https://blog.ovhcloud.com/enhance-your-applications-with-ai-endpoints/" data-wpel-link="internal"> blog post</a> if you want to know more about AI Endpoints.<br>You can also find more info on <a href="https://endpoints.ai.cloud.ovh.net" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a> in our <a href="https://blog.ovhcloud.com/tag/ai-endpoints/" data-wpel-link="internal">previous blog posts</a>.</p>



<p>This blog post explains how to create a podcast transcript using Whisper, a powerful automatic speech recognition (ASR) system developed by OpenAI. Whisper integrates with <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a> and makes it easy to transcribe audio files and add features, like speaker diarization.</p>



<p><em>ℹ️ You can find the full code on <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/ai-endpoints/podcast-transcript-whisper/python" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Github</a> ℹ️</em></p>



<h3 class="wp-block-heading">Environment Setup</h3>



<p>Define your environment variables for accessing <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>:</p>



<pre title="AI Endpoints environment variables" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">$ export OVH_AI_ENDPOINTS_WHISPER_URL=&lt;whisper model URL&gt;
$ export OVH_AI_ENDPOINTS_ACCESS_TOKEN=&lt;your_access_token&gt;
$ export OVH_AI_ENDPOINTS_WHISPER_MODEL=whisper-large-v3</code></pre>



<p>Install dependencies:</p>



<pre title="Dependencies installation" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">$ pip install -r requirements.txt</code></pre>



<h3 class="wp-block-heading">Audio transcription</h3>



<p>With Whisper and the OpenAI client, transcribing audio is as simple as writing a few lines of code:</p>



<pre title="Audio transcription" class="wp-block-code"><code lang="python" class="language-python line-numbers">import os
import json
from openai import OpenAI

# 🛠️ OpenAI client initialisation
client = OpenAI(base_url=os.environ.get('OVH_AI_ENDPOINTS_WHISPER_URL'), 
                api_key=os.environ.get('OVH_AI_ENDPOINTS_ACCESS_TOKEN'))

# 🎼 Audio file loading
with open("../resources/TdT20-trimed-2.mp3", "rb") as audio_file:
    # 📝 Call Whisper transcription API
    transcript = client.audio.transcriptions.create(
        model=os.environ.get('OVH_AI_ENDPOINTS_WHISPER_MODEL'),
        file=audio_file,
        temperature=0.0,
        response_format="verbose_json",
        extra_body={"diarize": True},
    )</code></pre>



<p>FYI:<br>&#8211; we use ‘<em>diarize</em>’ (not a Whisper parameter) to enable diarization, because the OpenAI client lets us add extra body parameters.<br>&#8211; you need <em>verbose_json</em> for diarization (which also means <em>segmentation</em> mode)</p>



<p>Once you have the full transcript, format it in a way that’s easy for humans to read.</p>



<h3 class="wp-block-heading">Create the script</h3>



<p>The JSON field ‘<em>diarization</em>’ contains all of the transcribed, diarized content.</p>



<pre title="JSON response for diarization" class="wp-block-code"><code lang="json" class="language-json line-numbers">"diarization": [
    {
      "speaker": 0,
      "text": "bla bla bla",
      "start": 16.5,
      "end": 26.38
    },
    {
      "speaker": 1,
      "text": "bla bla",
      "start": 26.38,
      "end": 32.6
    },
    {
      "speaker": 1,
      "text": "bla bla",
      "start": 32.6,
      "end": 40.6
    },
    {
      "speaker": 2,
      "text": "bla bla",
      "start": 40.6,
      "end": 42
    }
]</code></pre>



<p>Because they are segmented, you can merge several fields for the same speaker as detailed below—for speaker 1.</p>



<p>Here’s a sample code for creating the script of a <a href="https://smartlink.ausha.co/tranches-de-tech" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">French podcast</a> featuring 3 speakers:</p>



<pre title="Merge sentences for same speaker" class="wp-block-code"><code lang="python" class="language-python line-numbers"># 🔀 Merge the dialog said by the same speaker     
diarizedTranscript = ''
speakers = ["Aurélie", "Guillaume", "Stéphane"]
previousSpeaker = -1
jsonTranscript = json.loads(transcript.model_dump_json())

# 💬 Only the diarization field is useful
for dialog in jsonTranscript["diarization"]:
    speaker = dialog.get("speaker")
    text = dialog.get("text")
    if (previousSpeaker == speaker):
        diarizedTranscript += f" {text}"
    else:
        diarizedTranscript += f"\n\n{speakers[speaker]}: {text}"
    previousSpeaker = speaker

print(f"\n📝 Diarized Transcript 📝:\n{diarizedTranscript}")
</code></pre>



<p>Lastly, run the Python script:</p>



<pre class="wp-block-code"><code lang="" class=" line-numbers">$ python PodcastTranscriptWithWhisper.py

📝 Diarized Transcript 📝:

Stéphane: Bonjour tout le monde, ravi de vous retrouver pour l'enregistrement de ce dernier épisode de la saison avant de prendre des vacances bien méritées et de vous retrouver à la rentrée pour la troisième saison. Nous enregistrons cet épisode le 30 juin à la fraîche, enfin si on peut dire au vu des températures déjà présentes en cette matinée. Justement, elle revient chaudement de Sunnytech et c'est avec plaisir que je la retrouve pour l'enregistrement de cet épisode. Bonjour Aurélie, comment vas-tu ?

Aurélie: Salut, alors ça va très bien. Alors j'avoue, j'ai également très chaud. J'ai le ventilateur qui est juste à côté de moi donc ça va aller pour l'enregistrement du podcast.

Stéphane: Oui, c'est vrai qu'il fait un peu chaud. Et pour ce dernier épisode de la saison, c'est avec un mélange de joie mais aussi d'intimidation que je reçois notre invité. Si je fais ce métier de la façon dont je le fais, c'est grandement grâce à lui. Ce podcast, quelque part, a bien entendu des inspirations de ce que fait notre invité. Je suis donc très content de te recevoir Guillaume. Bonjour Guillaume, comment vas-tu et souhaites-tu te présenter à nos auditrices et auditeurs ? Bonjour à

Guillaume: tous et bien merci déjà de m'avoir invité. Je suis très content de rejoindre votre podcast pour cet épisode. Je m'appelle Guillaume Laforge, je suis un développeur Java depuis la première heure depuis très très longtemps. Je travaille chez Google, en particulier dans la partie Google Cloud. Je me focalise beaucoup sur tout ce qui est Generative AI vu que c'est à la mode évidemment. Les gens me connaissent peut-être ou peut-être ma voix d'ailleurs parce que je fais partie du podcast Les Cascodeurs qu'on a commencé il y a 15 ans ou quelque chose comme ça. Il y a trop longtemps. Ou alors ils me connaissent parce que je suis un des co-fondateurs du langage Groovy, Apache Groovy.</code></pre>



<p>Feel free to try out our new product, <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>, and share your thoughts.</p>



<p>Hang out with us on Discord at #<em>ai-endpoints or</em> <em><a href="https://discord.gg/ovhcloud" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://discord.gg/ovhcloud</a></em>. See you soon!</p>
<img decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fcreate-a-podcast-transcript-with-whisper-by-ai-endpoints%2F&amp;action_name=Create%20a%20podcast%20transcript%20with%20Whisper%20by%20AI%20Endpoints&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
