DPDP Act
Data Retention Requirements under the DPDP Act: What to Keep and When to Delete
DPDP Act data retention obligations for Indian companies. Understand storage limitation, AI audit log retention, model training data, and deletion workflows.
Storage limitation under the DPDP Act
Section 8(9) of the DPDP Act establishes the storage limitation principle: personal data must not be retained for longer than is necessary for the specified purpose of processing. When the purpose is fulfilled, the personal data must be erased. This principle applies to every store of personal data in your stack — relational databases, object stores, AI logs, embedding databases, caches, and backup tapes.
The Act does not prescribe specific retention periods — those will be set in Rules or sector-specific regulations. Under existing Indian law, certain retention requirements already apply: the PMLA (Prevention of Money Laundering Act) requires KYC records to be retained for five years; the Income Tax Act requires tax-related records for seven years; RBI master directions specify retention periods for financial records. Where these sector regulations set retention periods, they take precedence over the DPDP Act's general storage limitation principle.
For AI-specific data with no sector-specific retention obligation, the default principle is: retain only as long as needed for the stated purpose. If you use customer support conversations to evaluate your LLM's quality and improve its prompts, those conversations should be deleted once the evaluation cycle is complete (typically 30–90 days). If you use them for model fine-tuning, they should be deleted after the fine-tuning run, with the fine-tuned model weights retained only if they cannot be used to recover the original data.
AI audit logs: how long to keep them
AI governance audit logs present a tension between two obligations: (a) the storage limitation principle, which pushes towards deletion once the purpose is fulfilled, and (b) the accountability principle, which requires maintaining evidence of compliance. Resolve this tension by distinguishing between operational logs (raw request/response content) and compliance logs (governance decisions and actions).
Operational logs — the actual content of prompts and model responses — should be retained for the minimum period needed for debugging, quality evaluation, and abuse detection. Typically 30–90 days is appropriate. After this period, either delete the content or strip it of personal data, retaining only the metadata (timestamp, model provider, latency, cost, policy decisions applied).
Compliance logs — the record of PII detections, policy decisions, consent checks, and redaction actions — should be retained for a longer period aligned with your limitation of liability exposure. Since DPDP enforcement is just beginning, a 3-year retention period for compliance logs is reasonable — long enough to defend against historical complaints, short enough to avoid indefinite accumulation of old governance records.
Deleting personal data: technical implementation
Deletion under the DPDP Act means deletion from all stores, not just the primary database. Build a deletion workflow that enumerates every location where a specific user's personal data might exist: main database, analytical databases, data warehouses, AI prompt logs, vector stores, recommendation model training data, caches, CDN caches, email systems, customer support tools, and third-party processors.
For AI-specific deletion, the most challenging scenario is personal data used in fine-tuned model weights. If a user requests erasure of their data and some of that data was used to fine-tune a model, you must assess whether the fine-tuned model weights need to be discarded or whether the data's contribution to the weights is negligible. The technical answer is usually that individual training examples are poorly represented in large model weights, but this is an active area of legal and technical debate.
Implement automated retention policy enforcement rather than relying on manual deletion processes. Set database-level TTLs for logs and transient data. Configure object storage lifecycle policies to move data to cold storage and eventually delete it. Use a retention management system that tracks personal data locations and triggers deletion workflows when retention periods expire. Manual deletion is error-prone and will miss new data stores as your architecture evolves.
DPDP Act operational checklist
Data Retention Requirements under the DPDP Act: What to Keep and When to Delete should be reviewed as an operating control, not only as a reference article. The minimum checklist is a data inventory, a stated processing purpose, owner approval, PII detection at the AI boundary, redaction or tokenisation where possible, retention limits, vendor transfer records, and a tested user-rights workflow. This checklist gives engineering and compliance teams a shared language for deciding what must be blocked, what can be allowed in shadow mode, and what needs human review before production release.
For AI systems, the review should include prompts, retrieved context, tool call arguments, model responses, logs, traces, analytics events, exports, and support attachments. Many incidents happen because teams scan only the visible form field while sensitive data moves through background context or observability tooling. CrewCheck's recommended pattern is to place the scanner at the request boundary, record the policy version, and keep audit evidence that shows which identifiers were detected and what action was taken.
A practical rollout starts with representative samples from production-like traffic. Run a DPDP scan, sort findings by identifier sensitivity and blast radius, fix Aadhaar, PAN, financial, health, children's, and precise-location exposure first, then move to consent wording, retention, deletion, and vendor review. Use shadow mode when false positives could disrupt users, and promote to enforcement only after the exceptions have owners and expiry dates.
This page is educational and should be paired with legal review for final policy interpretation. The operational proof should still come from repeatable evidence: scanner results, audit exports, pull-request checks, policy configuration, and a documented owner for the workflow. That combination is what makes the content useful during buyer diligence, board review, regulatory questions, or an incident investigation.
DPDP Act pillar implementation addendum
A pillar page should also connect the legal idea to a concrete implementation path. Start with ownership: name the product owner, engineering owner, security reviewer, and compliance reviewer for this topic. Then map the systems that can create, store, transform, or transmit the relevant personal data. The map should include frontend forms, backend APIs, queues, warehouses, LLM prompts, embedding stores, admin exports, vendor dashboards, and customer-success tooling.
Next, document the lawful purpose and the user-facing notice. The notice should be clear enough that a data principal understands what is processed, why AI may be involved, what categories of personal data are affected, and how consent or withdrawal works. If the workflow supports children, healthcare, financial services, employment, or government delivery, treat that context as higher risk and add stricter review before allowing personal data into model calls.
The engineering control should run before data leaves the application boundary. Scan the full prompt package, not just the user's message. That means system instructions, retrieved snippets, tool outputs, attachments, OCR text, chat history, and structured JSON all need inspection. When a high-confidence identifier is found, redact, tokenise, block, or route to a safer model depending on the policy. Keep the original sensitive value out of general logs unless a protected exception is approved.
Audit evidence should be designed for reconstruction. A reviewer should be able to answer: when did the request happen, which application sent it, which data type was detected, which rule fired, what action was taken, which provider received the final payload, and who approved any exception. Without that trail, teams are left with policy claims rather than proof. With it, they can respond faster to buyer diligence, internal audits, breach triage, and regulator questions.
Finally, make the process repeatable. Add sample payloads to tests, run scheduled scans against logs and representative documents, check sitemap and page health for public guidance, and keep the DPDP scanner linked from the page so readers can move from learning to action. The goal is not to freeze the system; it is to make every future AI workflow easier to review, safer to launch, and easier to explain.
Related pages
Check your own workflow
Run a free DPDP scan before this risk reaches production.
Scan prompts, logs, documents, and API payloads for Indian PII exposure, missing redaction, and audit gaps. Backlinks: learn hub, developer docs, pricing, and the DPDP scanner.