How the Moderation Endpoint Works
When you submit a prompt to the Moderation API, it returns a JSON payload with three primary sections:| Field | Type | Description |
|---|---|---|
| flagged | boolean | true if any policy violation is detected; false otherwise |
| categories | object | A map of violation categories (e.g., hate, self_harm) to boolean |
| category_scores | object | Confidence scores (0.0–1.0) for each category |
Example Request
Example Response
Use the confidence values in
category_scores to prioritize human review of borderline cases.Integrating Moderation into Your Application Workflow
Adopt a secure, four-step flow to vet user inputs before content generation:- Receive the user prompt.
- Call the Moderation API.
- If
flaggedistrue, return an error:
“Your request violates our content policy and cannot be processed.” - If
flaggedisfalse, continue.
- If
- Invoke the Generation API.
- Return the generated response to the end user.
Always enforce the moderation step. Skipping it may expose your system to disallowed or harmful content.
Best Practices
- Batch multiple inputs in a single moderation request to reduce latency.
- Monitor and log flagged inputs for auditing and continuous policy tuning.
- Adjust internal thresholds based on
category_scorestrends to minimize false positives.