DemoPerforming Fine Tuning Part 1

In this lesson, you’ll learn how to fine-tune an OpenAI model using the CLI. We’ll cover:

Preparing your dataset
Uploading and formatting data
Launching and monitoring a fine-tune job

Although OpenAI is deprecating older fine-tuning models by January 4th and announcing GPT-3.5/GPT-4 support soon, the core workflow remains the same.

Most existing fine-tuning models (e.g., older Curie, Davinci) will retire by January 4th. It’s recommended to wait for GPT-3.5/GPT-4 fine-tuning, but you can continue experimenting with Ada, Babbage, Curie, and Davinci until then.

The image shows a webpage from OpenAI's documentation about fine-tuning models, detailing the deprecation of certain models and the benefits of fine-tuning. It includes a sidebar with navigation links and a highlighted announcement about model updates.

Pricing for fine-tuning varies by model. Ada remains the most cost-effective, while Davinci is the most expensive. Regardless of your budget, the process is identical across models.

The image shows a webpage from OpenAI detailing pricing for fine-tuning and embedding models, including costs for different models like Ada, Babbage, Curie, and Davinci.

Use Case: State of the Union Q&A Chatbot

We’ll build a simple chatbot that answers questions about President Biden’s February 7, 2023 State of the Union address. Since GPT-3.5’s knowledge cutoff is 2021, it won’t know this speech. We’ll fine-tune using a publicly available summary from the European Parliament.

1. Prepare the Raw JSONL Dataset

First, convert your prompt/completion pairs into JSON Lines format. Example qna.jsonl:

{"prompt":"What did President Biden highlight about his interactions with Republicans?","completion":"President Biden highlighted his past interactions with Republicans as opportunities for bipartisan cooperation."}
{"prompt":"What did President Biden emphasize about finding common ground with Republicans?","completion":"President Biden emphasized the importance of finding common ground with Republicans to achieve legislative goals."}
{"prompt":"How did President Biden characterize the potential for compromise with Republicans?","completion":"President Biden characterized the potential for compromise with Republicans as achievable and essential for the country's progress."}

Save qna.jsonl alongside your PDF source:

(venv) [Demo] > ls
SOTU.pdf  qna.jsonl

2. Upload and Prepare the Dataset

Use the OpenAI CLI to validate, dedupe, and add separators:

(venv) [Demo] > openai tools fine_tunes.prepare_data -f qna.jsonl
Analyzing...
- Your file contains 411 prompt-completion pairs
- 116 duplicated rows will be removed
- Prompts will get a suffix separator '\n\n###\n\n'
- Completions will get a suffix ending '***'
- Completions will start with a whitespace
Proceed? [Y/n] Y
> Wrote prepared file to qna_prepared.jsonl

Verify the new file:

(venv) [Demo] > ls
SOTU.pdf  qna.jsonl  qna_prepared.jsonl

Sample Entries in `qna_prepared.jsonl`

{"prompt":"When did Joe Biden deliver his State of the Union address? ->\n\n###\n\n","completion":" Joe Biden delivered his State of the Union address on 7 February 2023.\n END***"}
{"prompt":"To whom did Joe Biden deliver his State of the Union address? ->\n\n###\n\n","completion":" Joe Biden delivered his State of the Union address to a joint session of the 118th US Congress (2023–2024).\n END***"}
{"prompt":"What was the difference in the government's composition between 2022 and 2023? ->\n\n###\n\n","completion":" In 2022, the US had a 'united' government with Democrats holding a majority in both chambers; in 2023, it became 'divided' with a Republican-led House of Representatives.\n END***"}

3. Create the Fine-Tuning Job

Select a base model (ada, babbage, curie, or davinci) and start fine-tuning:

(venv) [Demo] > openai api fine_tunes.create \
  -t qna_prepared.jsonl \
  -m davinci

You’ll see upload progress and receive a fine-tune ID:

Upload progress: 100%|████████████| 78.2k/78.2k [00:00<00:00]
Uploaded file: file-XXXXXXXXXXXX
Created fine-tune: ft-YYYYYYYYYYYY
Streaming events until fine-tuning is complete...

Model Comparison

Model	Cost (fine-tuning)	Status
Ada	$0.0008 / 1K tokens	Supported until Jan 4
Babbage	$0.0016 / 1K tokens	Supported until Jan 4
Curie	$0.0032 / 1K tokens	Supported until Jan 4
Davinci	$0.012 / 1K tokens	Supported until Jan 4

4. Monitor and Complete

Follow the job status:

(venv) [Demo] > openai api fine_tunes.follow -i ft-YYYYYYYYYYYY

Example output:

[2023-08-13 12:16:31] Fine-tune enqueued. Queue number: 0
[2023-08-13 12:16:32] Fine-tune started
[2023-08-13 12:20:49] Completed epoch 1/4
[2023-08-13 12:22:11] Completed epoch 2/4
[2023-08-13 12:23:53] Completed epoch 3/4
[2023-08-13 12:25:24] Completed epoch 4/4
[2023-08-13 12:25:30] Fine-tune succeeded

After completion, note the recommended openai api completions.create command in the CLI output to invoke your fine-tuned model.

5. Invoke Your Fine-Tuned Model

Once the job succeeds, run:

openai api completions.create \
  -m davinci:ft-your-org-2023-08-13-06-56-03 \
  -p "When did Joe Biden deliver his State of the Union address?"

The model will respond based on your custom SOTU Q&A dataset. Adjust the model choice and epochs to balance accuracy vs. cost.

Introduction

What is Generative AI

Understanding Prompt Engineering

Understanding Tokens and API Parameters

Implementing Word Completion

Implementing Code Completion

Building an Interactive Chatbot

Performing Text Processing and Analysis

Using Word Embeddings For Dynamic Context

Fine tuning GPT 3 with a Custom Dataset

Generating Images

Audio Transcription Translation

Moderating Prompts with Moderating API

Summary and Next Steps

Exploring Chat GPT

Getting Started with Open AI

DemoPerforming Fine Tuning Part 1

Use Case: State of the Union Q&A Chatbot

1. Prepare the Raw JSONL Dataset

2. Upload and Prepare the Dataset

Sample Entries in `qna_prepared.jsonl`

3. Create the Fine-Tuning Job

Model Comparison

4. Monitor and Complete

5. Invoke Your Fine-Tuned Model

Links and References

Watch Video

Introduction

What is Generative AI

Understanding Prompt Engineering

Understanding Tokens and API Parameters

Implementing Word Completion

Implementing Code Completion

Building an Interactive Chatbot

Performing Text Processing and Analysis

Using Word Embeddings For Dynamic Context

Fine tuning GPT 3 with a Custom Dataset

Generating Images

Audio Transcription Translation

Moderating Prompts with Moderating API

Summary and Next Steps

Exploring Chat GPT

Getting Started with Open AI

​Use Case: State of the Union Q&A Chatbot

​1. Prepare the Raw JSONL Dataset

​2. Upload and Prepare the Dataset

​Sample Entries in qna_prepared.jsonl

​3. Create the Fine-Tuning Job

​Model Comparison

​4. Monitor and Complete

​5. Invoke Your Fine-Tuned Model

​Links and References

Watch Video

Use Case: State of the Union Q&A Chatbot

1. Prepare the Raw JSONL Dataset

2. Upload and Prepare the Dataset

Sample Entries in `qna_prepared.jsonl`

3. Create the Fine-Tuning Job

Model Comparison

4. Monitor and Complete

5. Invoke Your Fine-Tuned Model

Links and References