Learn to fine-tune an OpenAI model using the CLI, covering dataset preparation, data uploading, and fine-tune job monitoring.
In this lesson, you’ll learn how to fine-tune an OpenAI model using the CLI. We’ll cover:
Preparing your dataset
Uploading and formatting data
Launching and monitoring a fine-tune job
Although OpenAI is deprecating older fine-tuning models by January 4th and announcing GPT-3.5/GPT-4 support soon, the core workflow remains the same.
Most existing fine-tuning models (e.g., older Curie, Davinci) will retire by January 4th. It’s recommended to wait for GPT-3.5/GPT-4 fine-tuning, but you can continue experimenting with Ada, Babbage, Curie, and Davinci until then.
Pricing for fine-tuning varies by model. Ada remains the most cost-effective, while Davinci is the most expensive. Regardless of your budget, the process is identical across models.
We’ll build a simple chatbot that answers questions about President Biden’s February 7, 2023 State of the Union address. Since GPT-3.5’s knowledge cutoff is 2021, it won’t know this speech. We’ll fine-tune using a publicly available summary from the European Parliament.
First, convert your prompt/completion pairs into JSON Lines format. Example qna.jsonl:
Copy
Ask AI
{"prompt":"What did President Biden highlight about his interactions with Republicans?","completion":"President Biden highlighted his past interactions with Republicans as opportunities for bipartisan cooperation."}{"prompt":"What did President Biden emphasize about finding common ground with Republicans?","completion":"President Biden emphasized the importance of finding common ground with Republicans to achieve legislative goals."}{"prompt":"How did President Biden characterize the potential for compromise with Republicans?","completion":"President Biden characterized the potential for compromise with Republicans as achievable and essential for the country's progress."}
Use the OpenAI CLI to validate, dedupe, and add separators:
Copy
Ask AI
(venv) [Demo] > openai tools fine_tunes.prepare_data -f qna.jsonlAnalyzing...- Your file contains 411 prompt-completion pairs- 116 duplicated rows will be removed- Prompts will get a suffix separator '\n\n###\n\n'- Completions will get a suffix ending '***'- Completions will start with a whitespaceProceed? [Y/n] Y> Wrote prepared file to qna_prepared.jsonl
{"prompt":"When did Joe Biden deliver his State of the Union address? ->\n\n###\n\n","completion":" Joe Biden delivered his State of the Union address on 7 February 2023.\n END***"}{"prompt":"To whom did Joe Biden deliver his State of the Union address? ->\n\n###\n\n","completion":" Joe Biden delivered his State of the Union address to a joint session of the 118th US Congress (2023–2024).\n END***"}{"prompt":"What was the difference in the government's composition between 2022 and 2023? ->\n\n###\n\n","completion":" In 2022, the US had a 'united' government with Democrats holding a majority in both chambers; in 2023, it became 'divided' with a Republican-led House of Representatives.\n END***"}