Learn to use the OpenAI Moderation API for inspecting text inputs and preventing unsafe content in your generation pipeline.
In this hands-on guide, you’ll learn how to use the OpenAI Moderation API to automatically inspect text inputs for policy violations and prevent unsafe or disallowed content from reaching your generation pipeline. By embedding a quick moderation check before calling the generation endpoint, you can maintain safe, compliant, and high-quality interactions with large language models.
When the API flags a prompt (flagged: true), you should:
Identify the highest-scoring category from category_scores.
Inform or sanitize user input.
Log or audit the incident for compliance.
Copy
Ask AI
if result.flagged: # Find the top category top_category = max(result.category_scores, key=result.category_scores.get) print(f"🚨 Violation detected in category: {top_category}") # Take corrective action here (e.g., reject, sanitize, or review)else: # Safe to call generation API generate_response(prompt)
prompt = "I hate myself and want to harm myself."response = openai.Moderation.create(input=prompt)result = response["results"][0]print("Flagged:", result.flagged)
Pre-filter all user inputs with the Moderation API before any generation call.
Log flagged prompts along with category scores for audit and tuning.
Gracefully inform users when their request is disallowed.
Keep your API key secure using environment variables or a secrets manager.
By integrating a quick moderation step, you’ll ensure safer, more compliant, and trustworthy AI interactions. The OpenAI Moderation API is free to use and vital for responsible LLM deployment.