Shadow Mode
Test governance controls on live traffic without enforcement — see exactly what would be blocked before flipping the switch
Key Takeaways
- 1Shadow mode evaluates governance controls without enforcing them — traffic flows normally while you observe what would be caught
- 2Essential for measuring false positive rates before enforcement disrupts production workflows
- 3Enables data-driven confidence: promote to enforcement only when detection accuracy meets your threshold
- 4CrewCheck supports per-rule shadow mode — test new rules individually without affecting existing enforcement
What Is Shadow Mode?
Shadow mode is a testing configuration where AI governance controls are evaluated against live traffic but not enforced. Requests pass through normally — nothing is blocked, masked, or modified — but the system records what would have happened if enforcement were active.
Think of it as a dress rehearsal for governance controls. You see the full picture — detection rates, false positives, policy impacts, latency overhead — without any risk of disrupting production workflows.
This is critical because governance controls that look perfect in testing often behave differently with real-world data. Customer support messages have different PII patterns than test data. Shadow mode reveals these gaps safely.
Why You Need Shadow Mode
Deploying governance controls directly to enforcement is risky. Here's what shadow mode reveals before you commit:
Shadow Mode vs. Enforcement Mode
Understanding the difference between shadow and enforcement modes:
Shadow Mode (Observe)
- Traffic flows normally — nothing blocked
- Detections are logged but not acted upon
- False positives don't affect users
- Measures detection accuracy safely
- No latency impact on responses
- Can run indefinitely without risk
Enforcement Mode (Act)
- Detected PII is masked before forwarding
- Policy violations block or modify requests
- False positives may disrupt workflows
- Requires high confidence in accuracy
- Adds detection latency to request path
- Requires monitoring and incident response
The Shadow-to-Enforcement Pipeline
The recommended rollout process for new governance controls follows a graduated pipeline:
Stage 1 — Shadow on sample traffic (10%): Route a small percentage of traffic through the new control in shadow mode. Validate basic functionality and catch obvious issues.
Stage 2 — Shadow on full traffic (100%): Expand to all traffic in shadow mode. Measure detection rates, false positives, and latency across the full range of real-world inputs.
Stage 3 — Enforcement on sample traffic (10%): Once shadow metrics meet your thresholds, enable enforcement on a small percentage. Monitor for user-reported issues.
Stage 4 — Full enforcement (100%): Promote to full enforcement with confidence backed by data from stages 1-3.
Each stage should run for at least a few days to capture edge cases and traffic pattern variations.
Per-Rule Shadow Mode
CrewCheck supports shadow mode at the individual rule level, not just globally. This means you can have existing rules in enforcement while testing new rules in shadow — simultaneously.
Example: Your Aadhaar masking rule is in enforcement (proven accurate), while a new ABHA ID detection rule runs in shadow mode. The Aadhaar rule actively protects traffic while you validate the ABHA rule's accuracy.
This granular control is essential for continuous improvement — you're always testing the next rule without risking the controls that are already working.
Metrics to Watch in Shadow Mode
Key metrics to monitor during shadow observation before promoting to enforcement:
- ✗True positive rate — what percentage of actual PII is correctly detected?
- ✗False positive rate — what percentage of flagged items are not actually PII?
- ✗Detection volume — how many detections per hour/day? Is this expected?
- ✗Latency overhead — how much time does the control add to request processing?
- ✗Coverage gaps — are there PII formats or contexts that the rule misses?
- ✗Edge cases — any unexpected behavior with multilingual text, code, or structured data?
How CrewCheck Implements Shadow Mode
CrewCheck's shadow mode operates at the gateway level. When a rule is in shadow mode, the detection pipeline runs normally — extracting candidates, validating formats, scoring context — but the final masking step is skipped.
Instead, the detection result is logged to the audit trail with a 'shadow' flag. The governance dashboard shows shadow detections in a separate view, with metrics comparing what would have been caught versus what actually passed through.
Promoting a rule from shadow to enforcement is a single-click operation in the dashboard. The rule immediately begins masking detected PII, with the same detection logic that was validated during the shadow period.
Frequently Asked Questions
How long should I run shadow mode before enforcement?
At minimum 1-2 weeks on full traffic. This captures weekday/weekend patterns, edge cases, and gives you enough data volume for statistically meaningful accuracy metrics. For high-stakes rules, consider 4 weeks.
Does shadow mode add latency?
Minimal in practice, but we now describe it using the same production methodology as the main gateway. CrewCheck's current production measurement is sub-100ms gateway overhead at P95, reported separately from upstream provider time.
Can I shadow test on production traffic safely?
Yes — that's exactly what shadow mode is for. No traffic is modified, no requests are blocked. The only output is log entries showing what would have happened. There's zero risk to production users.
Continue Reading
Deepen your understanding with related concepts
Related Actions
See Shadow Mode in action
Try CrewCheck's live governance demo — paste any text containing Indian PII and watch real-time detection, masking, and audit logging. No sign-up required.