Compliance
PII Redaction vs Masking: Key Differences for DPDP Compliance
Understand the difference between PII redaction and masking, when to use each, and which approach satisfies DPDP Act requirements for AI applications.
Redaction vs Masking: The Core Distinction
Redaction removes the sensitive value entirely and replaces it with a type-tagged placeholder: 'Call me at [PHONE_NUMBER]' or 'Aadhaar: [AADHAAR]'. The original value is gone from the processed text — it cannot be recovered from the output alone.
Masking replaces characters while preserving format and partial information: 'XXXX XXXX 3456' (last 4 digits of Aadhaar), 'A****Z1234A' (masked PAN). The original value can often be inferred or recovered with additional context.
When to Use Redaction
Use redaction when the sensitive value is not needed for the downstream processing. For LLM prompts: if you're asking a customer service bot to help with a billing query, it doesn't need the customer's actual Aadhaar number — redact it. For audit logs: store the event ('PII detected: AADHAAR') without the value. For training data: remove all PII before any data reaches LLM fine-tuning pipelines.
DPDP Section 8(3) on data minimisation argues strongly for redaction as the default: 'personal data shall be complete, accurate, and consistent... collected for a specified purpose'. If the Aadhaar number isn't necessary for the specified purpose, it should be fully redacted.
When to Use Masking
Use masking when the user needs to verify or reference a value without exposing it fully. Classic example: show the last 4 digits of an Aadhaar in an account settings page so the user knows which Aadhaar is on file, without displaying the full number. Similarly for bank account numbers (XXXX XXXX 1234) and PAN (ABCDE####F).
Masking is also appropriate in customer support interfaces: the support agent can see the last 4 digits of a PAN for verification without seeing the full number that could be misused. This is 'need-to-know' access control implemented at the display layer.
DPDP Compliance Implications
For most AI use cases, redaction is required, not masking. When personal data is sent to a third-party LLM API, the API provider's terms typically allow them to process the data for service improvement. Sending even masked Aadhaar (which contains partial information) to OpenAI's API without DPDP-compliant data processing agreements is risky.
Full redaction before the data leaves your infrastructure eliminates this risk entirely. The LLM receives '[AADHAAR] [PHONE]' instead of actual values — it can still understand the context of the query and provide a helpful response.
One exception: when you need to use the actual value in the LLM response (e.g., 'confirm you want to link Aadhaar 1234-XXXX-XXXX to your account'). In this case, a tokenisation approach is better than masking: store the actual value in your system, pass a token to the LLM, and substitute the value back in the response post-processing.
Compliance operational checklist
PII Redaction vs Masking: Key Differences for DPDP Compliance should be reviewed as an operating control, not only as a reference article. The minimum checklist is a data inventory, a stated processing purpose, owner approval, PII detection at the AI boundary, redaction or tokenisation where possible, retention limits, vendor transfer records, and a tested user-rights workflow. This checklist gives engineering and compliance teams a shared language for deciding what must be blocked, what can be allowed in shadow mode, and what needs human review before production release.
For AI systems, the review should include prompts, retrieved context, tool call arguments, model responses, logs, traces, analytics events, exports, and support attachments. Many incidents happen because teams scan only the visible form field while sensitive data moves through background context or observability tooling. CrewCheck's recommended pattern is to place the scanner at the request boundary, record the policy version, and keep audit evidence that shows which identifiers were detected and what action was taken.
A practical rollout starts with representative samples from production-like traffic. Run a DPDP scan, sort findings by identifier sensitivity and blast radius, fix Aadhaar, PAN, financial, health, children's, and precise-location exposure first, then move to consent wording, retention, deletion, and vendor review. Use shadow mode when false positives could disrupt users, and promote to enforcement only after the exceptions have owners and expiry dates.
This page is educational and should be paired with legal review for final policy interpretation. The operational proof should still come from repeatable evidence: scanner results, audit exports, pull-request checks, policy configuration, and a documented owner for the workflow. That combination is what makes the content useful during buyer diligence, board review, regulatory questions, or an incident investigation.
Related pages
Check your own workflow
Run a free DPDP scan before this risk reaches production.
Scan prompts, logs, documents, and API payloads for Indian PII exposure, missing redaction, and audit gaps. Backlinks: learn hub, developer docs, pricing, and the DPDP scanner.