Guardrails
Guardrails let you add safety and validation checks on AI inputs and outputs. They run automatically before and after AI generation, and can pass, stop, or rewrite content based on your rules.
The system is built on Drupal's plugin architecture, so you can create custom guardrail plugins, group them into sets, and configure stop thresholds through the admin UI.
Architecture Overview
The Guardrails system has these main components:
- Guardrail plugins implement
AiGuardrailInterfaceand contain the actual validation logic. - Guardrail entities (
AiGuardrailconfig entities) wrap a plugin with admin-configurable settings. - Guardrail sets (
AiGuardrailSetconfig entities) group guardrails into pre-generation and post-generation lists with a stop threshold. GuardrailsEventSubscriberlistens toPreGenerateResponseEventandPostGenerateResponseEventand runs the configured guardrails.AiGuardrailHelperprovides a convenience method to attach a guardrail set to an AI input.
How Guardrails Execute
Input created
│
▼
AiGuardrailHelper::applyGuardrailSetToChatInput()
│ attaches a guardrail set to the input
▼
PreGenerateResponseEvent fires
│
▼
GuardrailsEventSubscriber::applyPreGenerateGuardrails()
│ runs each pre-generate guardrail plugin
│ aggregates StopResult scores
│ if score >= stop_threshold → forces output, skips AI call
│ if RewriteInputResult → rewrites the last message
▼
AI provider generates response
│
▼
PostGenerateResponseEvent fires
│
▼
GuardrailsEventSubscriber::applyPostGenerateGuardrails()
│ runs each post-generate guardrail plugin
│ if score >= stop_threshold → replaces output
│ if RewriteOutputResult → rewrites the response
▼
Final output returned
Result Types
Every guardrail plugin returns a GuardrailResultInterface from its processInput() and processOutput() methods. There are four result types:
| Result | stop() |
Effect |
|---|---|---|
PassResult |
false |
Input/output passes without changes. |
StopResult |
true |
Signals the input/output should be blocked. Carries a score (default 1.0) that is aggregated across guardrails. |
RewriteInputResult |
false |
Replaces the last chat message text with the result's message (pre-generation only). |
RewriteOutputResult |
false |
Replaces the AI response text with the result's message (post-generation only). |
All result types extend AbstractResult and take three constructor arguments:
new StopResult(
message: 'This content violates the regexp pattern.',
guardrail: $this, // The guardrail plugin instance.
context: [], // Optional context array.
score: 1.0, // StopResult only: the severity score.
);
Score Aggregation and Stop Threshold
Each AiGuardrailSet has a stop threshold (a float). When guardrails in a set run, the subscriber aggregates the score values from all StopResult instances. If the aggregated score reaches or exceeds the stop threshold, execution stops and the AI call is either skipped (pre-generation) or the output is replaced (post-generation).
This lets you combine multiple guardrails where each one contributes a partial score. For example, three guardrails each returning a StopResult with score 0.4 would aggregate to 1.2 — exceeding a threshold of 1.0.
Writing a Custom Guardrail Plugin
Guardrail plugins live in src/Plugin/AiGuardrail/ in your module and use the #[AiGuardrail] PHP attribute for discovery.
Minimal Example
<?php
declare(strict_types=1);
namespace Drupal\my_module\Plugin\AiGuardrail;
use Drupal\ai\Attribute\AiGuardrail;
use Drupal\ai\Guardrail\AiGuardrailPluginBase;
use Drupal\ai\Guardrail\Result\GuardrailResultInterface;
use Drupal\ai\Guardrail\Result\PassResult;
use Drupal\ai\Guardrail\Result\StopResult;
use Drupal\ai\OperationType\Chat\ChatInput;
use Drupal\ai\OperationType\Chat\ChatMessage;
use Drupal\ai\OperationType\InputInterface;
use Drupal\ai\OperationType\OutputInterface;
use Drupal\Core\StringTranslation\TranslatableMarkup;
/**
* Blocks messages that exceed a maximum word count.
*/
#[AiGuardrail(
id: 'max_word_count',
label: new TranslatableMarkup('Max Word Count'),
description: new TranslatableMarkup('Blocks input that exceeds a word limit.'),
)]
class MaxWordCount extends AiGuardrailPluginBase {
/**
* {@inheritdoc}
*/
public function processInput(InputInterface $input): GuardrailResultInterface {
if (!$input instanceof ChatInput) {
return new PassResult('Not a chat input, skipping.', $this);
}
$messages = $input->getMessages();
$last_message = end($messages);
if (!$last_message instanceof ChatMessage) {
return new PassResult('No text message found.', $this);
}
$word_count = str_word_count($last_message->getText());
$max_words = 500;
if ($word_count > $max_words) {
return new StopResult(
"Your message has $word_count words, which exceeds the $max_words word limit.",
$this,
);
}
return new PassResult('Word count within limits.', $this);
}
/**
* {@inheritdoc}
*/
public function processOutput(OutputInterface $output): GuardrailResultInterface {
return new PassResult('Output check not applicable.', $this);
}
}
With Configuration Form
If your guardrail needs admin-configurable settings, implement ConfigurableInterface and PluginFormInterface:
use Drupal\Component\Plugin\ConfigurableInterface;
use Drupal\Core\Form\FormStateInterface;
use Drupal\Core\Plugin\PluginFormInterface;
#[AiGuardrail(
id: 'max_word_count',
label: new TranslatableMarkup('Max Word Count'),
)]
class MaxWordCount extends AiGuardrailPluginBase implements ConfigurableInterface, PluginFormInterface {
public function getConfiguration(): array {
return $this->configuration;
}
public function setConfiguration(array $configuration): void {
$this->configuration = $configuration;
}
public function defaultConfiguration(): array {
return [];
}
public function buildConfigurationForm(array $form, FormStateInterface $form_state): array {
$form['max_words'] = [
'#type' => 'number',
'#title' => 'Maximum word count',
'#default_value' => $this->configuration['max_words'] ?? 500,
'#min' => 1,
];
return $form;
}
public function validateConfigurationForm(array &$form, FormStateInterface $form_state): void {}
public function submitConfigurationForm(array &$form, FormStateInterface $form_state): void {
$this->setConfiguration($form_state->getValues());
}
public function processInput(InputInterface $input): GuardrailResultInterface {
// Use $this->configuration['max_words'] instead of a hardcoded value.
$max_words = (int) ($this->configuration['max_words'] ?? 500);
// ... same logic as above ...
}
}
Special Interfaces
NonStreamableGuardrailInterface
A marker interface for guardrails that cannot process streamed responses. When a post-generation guardrail implements this interface, the event subscriber will reconstruct the full ChatOutput from the streamed iterator before passing it to processOutput().
Use this when your guardrail needs the complete response text to make a decision (e.g., sentiment analysis on the full reply).
use Drupal\ai\Guardrail\NonStreamableGuardrailInterface;
class MyGuardrail extends AiGuardrailPluginBase implements NonStreamableGuardrailInterface {
// No additional methods required -- it is a marker interface.
}
NonDeterministicGuardrailInterface
For guardrails that call AI services themselves (e.g., using an LLM to classify content). The plugin manager and event subscriber automatically inject the AiProviderPluginManager into plugins that implement this interface, so you can make AI calls within your guardrail logic.
Use the NeedsAiPluginManagerTrait for the boilerplate getter/setter:
use Drupal\ai\Guardrail\NeedsAiPluginManagerTrait;
use Drupal\ai\Guardrail\NonDeterministicGuardrailInterface;
class MyAiGuardrail extends AiGuardrailPluginBase implements NonDeterministicGuardrailInterface {
use NeedsAiPluginManagerTrait;
public function processInput(InputInterface $input): GuardrailResultInterface {
// Access the AI provider plugin manager.
$provider_manager = $this->getAiPluginManager();
// Get the default chat provider.
$default = $provider_manager->getDefaultProviderForOperationType('chat');
$provider = $provider_manager->createInstance($default['provider_id']);
// Make an AI call to classify the input.
$classification_input = new ChatInput([
new ChatMessage('user', 'Classify this text: ' . $text),
]);
$response = $provider->chat($classification_input, $default['model_id'], ['ai']);
// Use the classification result to decide pass/stop.
}
}
The built-in RestrictToTopic guardrail is a real-world example of this pattern. It uses an LLM to determine whether the user's message matches a list of allowed or disallowed topics.
Applying Guardrails to AI Input
Use AiGuardrailHelper::applyGuardrailSetToChatInput() to attach a guardrail set to any input before making an AI call:
// In a service or controller with dependency injection:
$guardrail_helper = \Drupal::service('ai.guardrail_helper');
$input = new ChatInput([
new ChatMessage('user', 'Tell me about Drupal.'),
]);
// Attach the guardrail set by its machine name.
$input = $guardrail_helper->applyGuardrailSetToChatInput('my_guardrail_set', $input);
// Make the AI call as usual. Guardrails run automatically via events.
$response = $provider->chat($input, $model_id, ['my_module']);
The method clones the input and calls setGuardrailSet() on it. When the AI provider fires its pre/post-generation events, the GuardrailsEventSubscriber picks up the attached set and runs the configured guardrails.
Built-in Guardrail Plugins
RegexpGuardrail
Checks the last chat message against a regular expression pattern. If the pattern matches, it returns a StopResult. Configurable fields:
- Regexp Pattern: The regular expression to match against.
- Violation Message: The message to display when the pattern matches. Use
@patternas a placeholder.
RestrictToTopic
Uses an AI provider to classify whether the user's message relates to a list of valid or invalid topics. This is a non-deterministic guardrail that implements both NonDeterministicGuardrailInterface and NonStreamableGuardrailInterface. Configurable fields:
- Valid Topics: List of allowed topics (one per line).
- Invalid Topics: List of disallowed topics (one per line).
- AI Provider/Model: The LLM used for topic classification.
- Violation messages: Custom messages for invalid topics found or valid topics missing.
Managing Guardrails in the UI
Guardrails are managed at Administration > Configuration > AI > Guardrails (/admin/config/ai/guardrails).
- Guardrails tab: Create and configure individual guardrail entities, each wrapping a guardrail plugin with specific settings.
- Guardrail Sets tab (
/admin/config/ai/guardrails/guardrail-sets): Create sets that group guardrails into pre-generation and post-generation lists, and set the stop threshold.
Required permissions:
administer guardrailsfor managing individual guardrail entities.administer guardrail setsfor managing guardrail sets.
Guardrail Modes
Guardrails can run at three points in the AI generation lifecycle, defined by AiGuardrailModeEnum:
| Mode | Enum Value | When |
|---|---|---|
| Pre-generate | pre |
Before the AI provider call. Can stop or rewrite the input. |
| Post-generate | post |
After the AI provider returns. Can stop or rewrite the output. |
| During-generate | during |
Reserved for future use with streaming guardrails. |