Content Validation
Reports category and severity for harmful content including hate speech, violence, sexual content, and more.
A model-agnostic guardrail service that provides instant diagnostics on your AI traffic — prompts, responses, and tool-calls — without interrupting the user experience or breaking model workflows.
Request
{
"tenant_id": "acme-corp",
"version_id": "v2-strict",
"content": {
"type": "prompt",
"text": "User: How do I..."
},
"checks": {
"content_moderation": true,
"prompt_injection": true,
"pii_detection": true,
"secrets_detection": true,
"denied_topics": ["medical"],
"blockwords": ["competitor-x"]
}
}Response 200 OK
{
"request_id": "req_01HX...",
"flagged": false,
"results":
{
"content_moderation": 0,
"prompt_injection": 0,
"pii_detected": 0,
"secrets_found": 0
},
"latency_ms": 7
}Every request is evaluated against these independent, toggleable configurations. In V1, all returns are binary (0/1). V2 introduces granular confidence scoring.
Reports category and severity for harmful content including hate speech, violence, sexual content, and more.
Detects jailbreak patterns, adversarial suffixes, and instruction override attempts in real-time.
Returns entity type, value, and character offsets for personal sensitive information found in text.
Standalone entropy and regex-based logic to detect API keys, AWS secrets, GitHub tokens, and more.
Semantic check against customer-defined prohibited themes like medical advice, legal counsel, and more.
Exact string match or regex-based detection for profanity, competitor names, or restricted terminology.
ApplyGuardRails acts as a middleware sidecar. Your orchestrator calls the Guardrail API at three critical junctions — with zero impact on the model's output or creative flow.
Check the user's input for injection attempts, PII, blocked content, and denied topics before it reaches the LLM. Stop threats at the gate.
When the model generates tool parameters (e.g., JSON for a function call), inspect the payload before it executes. Prevent excessive agency attacks.
After the LLM generates its answer, run the output through all six pillars. Catch secrets, PII, or harmful content before it reaches your users.
V2 introduces a complete security suite addressing the remaining OWASP LLM risks, expanding ApplyGuardRails into a full-spectrum AI security platform.
Scans model output for executable scripts (XSS/HTML) and alerts the application before it reaches users.
Analyzes JSON parameters of a tool-call before execution to flag "Excessive Agency" risks.
Compares model output to the stored System Prompt; flags high-similarity responses indicating potential system prompt leakage.
Scans retrieved context from Vector DBs for poisoned instructions before they reach the LLM context window.
Monitors and flags anomalous token usage patterns indicative of DoS attacks or resource-drain exploits.
V1 is available now
Start with the 6-pillar binary diagnostic suite today while V2's confidence scoring and advanced OWASP coverage ship on our public roadmap.
View the roadmap →Pay only for what you inspect. No seat fees, no egress, no surprises.