Demo Audio Transcription

Prerequisites
1. Play Audio Locally
2. Transcribe with Whisper
3. Next Steps: NLP Pipelines
4. Run Whisper Locally
References

In this tutorial, you’ll learn how to transcribe a short audio clip using OpenAI’s Whisper API. We’ve prepared a trimmed MP3 of the first five minutes of Steve Jobs’ Stanford commencement speech for this demo.

Prerequisites

Python 3.7+
An active OpenAI API key
openai Python package (pip install openai)
IPython for in-notebook audio playback (pip install ipython)

1. Play Audio Locally

Before sending the file to Whisper, verify playback in an IPython environment:

import IPython

file_name = "data/jobs.mp3"
IPython.display.Audio(file_name)

2. Transcribe with Whisper

Whisper currently offers the whisper-1 model for speech-to-text. Set your API key in the environment, then transcribe:

import openai
import os

# Load your OpenAI API key from environment
openai.api_key = os.getenv("OPENAI_API_KEY")

file_name = "data/jobs.mp3"
with open(file_name, "rb") as audio_file:
    transcript = openai.Audio.transcribe("whisper-1", audio_file)

print(transcript.text)

Make sure OPENAI_API_KEY is correctly set. On macOS/Linux:

export OPENAI_API_KEY="your_api_key_here"

3. Next Steps: NLP Pipelines

Once you have the raw transcript, you can feed it into large language models like GPT-3.5 Turbo or GPT-4 to:

Summarize the speech
Generate Q&A bots
Classify or analyze sentiment
Extract key topics

Use Case	Model	Example Link
Summarization	GPT-3.5 Turbo	API Reference
Question & Answer	GPT-4	API Reference
Sentiment Analysis	GPT-3.5 Turbo	Custom prompt engineering

4. Run Whisper Locally

If you prefer not to use the API, you can run Whisper on your machine via the open-source repository:

The image shows a code editor with Python code for transcribing audio using OpenAI's API, along with a large block of transcribed text.

References

Watch Video

Practice Lab

Overview of Whisper

Demo Audio Translation

⌘I

Introduction

What is Generative AI

Understanding Prompt Engineering

Understanding Tokens and API Parameters

Implementing Word Completion

Implementing Code Completion

Building an Interactive Chatbot

Performing Text Processing and Analysis

Using Word Embeddings For Dynamic Context

Fine tuning GPT 3 with a Custom Dataset

Generating Images

Audio Transcription Translation

Moderating Prompts with Moderating API

Summary and Next Steps

Exploring Chat GPT

Getting Started with Open AI

Demo Audio Transcription

Prerequisites

1. Play Audio Locally

2. Transcribe with Whisper

3. Next Steps: NLP Pipelines

4. Run Whisper Locally

References

Watch Video

Practice Lab

Introduction

What is Generative AI

Understanding Prompt Engineering

Understanding Tokens and API Parameters

Implementing Word Completion

Implementing Code Completion

Building an Interactive Chatbot

Performing Text Processing and Analysis

Using Word Embeddings For Dynamic Context

Fine tuning GPT 3 with a Custom Dataset

Generating Images

Audio Transcription Translation

Moderating Prompts with Moderating API

Summary and Next Steps

Exploring Chat GPT

Getting Started with Open AI

​Prerequisites

​1. Play Audio Locally

​2. Transcribe with Whisper

​3. Next Steps: NLP Pipelines

​4. Run Whisper Locally

​References

Watch Video

Practice Lab

Prerequisites

1. Play Audio Locally

2. Transcribe with Whisper

3. Next Steps: NLP Pipelines

4. Run Whisper Locally

References