Data Types

Indian PII Types: Complete Reference Guide

All Indian PII types covered under DPDP — Aadhaar, PAN, UPI, IFSC, Voter ID, Passport, mobile, email — with detection patterns and compliance notes.

16 min readUpdated 2026-05-04

What Qualifies as PII Under DPDP?

DPDP Section 2(t) defines personal data as 'any data about an individual who is identifiable by or in relation to such data'. This is broader than traditional PII definitions — it includes any data that could identify a person directly or in combination with other data.

For practical purposes, Indian PII falls into two tiers: directly identifying data (Aadhaar, PAN, mobile number, email, Voter ID, Passport) that identifies a person on its own; and quasi-identifying data (name, date of birth, address, employer, salary) that requires combination to identify someone. Both are personal data under DPDP.

Aadhaar Number

Format: 12 digits. First digit is 2–9 (never 0 or 1). Validated by the Verhoeff algorithm (checksum on all 12 digits). Issued by UIDAI to all Indian residents. Example pattern: 2345 6789 0123.

DPDP compliance notes: Aadhaar is the highest-sensitivity PII in India. The Aadhaar Act 2016 Section 29 prohibits publishing or displaying Aadhaar numbers. Using Aadhaar for purposes beyond identity verification is illegal. Never log, transmit, or store full Aadhaar numbers — store only the last 4 digits for display, and use a UIDAI-approved authentication flow for verification. Under DPDP, the explicit consent requirement applies with extra force for Aadhaar.

Detection: 12-digit sequences with Verhoeff checksum validation. False positive risk: any 12-digit number matches the format pattern. Always validate checksum before treating as Aadhaar.

PAN Card Number

Format: 10 characters — AAAAA9999A (5 uppercase letters, 4 digits, 1 uppercase letter). The 4th character encodes entity type (P=person, C=company, H=HUF, etc.). The 5th character is the first letter of the surname for individuals. Issued by Income Tax Department.

DPDP compliance notes: PAN is tax identifier data — sensitive in both financial and identity contexts. Must not be logged or transmitted in plaintext. Required for financial transactions over ₹50,000. Common surface: loan applications, investment accounts, and high-value e-commerce. Detection: regex [A-Z]{5}[0-9]{4}[A-Z]{1} with checksum validation on the 10th character.

UPI ID

Format: virtualaddress@upihandle (e.g., name@upi, 9876543210@paytm, user@okicici). Issued by UPI-registered banks and payment apps. Links directly to a bank account.

DPDP compliance notes: UPI IDs are financial data with direct payment capability. Knowing someone's UPI ID enables unsolicited payment requests (request-based fraud). Should be masked in logs and UI displays. Detection pattern: \w+@[a-z]+ with known UPI handles list validation (paytm, upi, okicici, okhdfcbank, ybl, ibl, apl, etc.).

IFSC Code + Account Number

IFSC format: 11 characters — AAAA0XXXXXX (4-letter bank code, 0, 6-character branch code). IFSC alone is not PII — it identifies a bank branch, not an individual. However, IFSC + account number + name = highly sensitive financial PII.

DPDP compliance: the combination of IFSC, account number, and name is sufficient for fraudulent bank transfers. Detect and redact this combination in LLM contexts, support tickets, and logs. Account number pattern varies by bank: 9–18 digits typically.

Indian Mobile Numbers

Format: 10 digits starting with 6, 7, 8, or 9. International format: +91 followed by 10 digits. Common variations in text: with/without +91, with/without spaces or hyphens.

DPDP compliance: mobile numbers are directly identifying and are the primary identifier for many Indian digital identities (OTP-based auth, UPI, Aadhaar-linked mobile). Redact in all contexts before LLM processing. Detection normalisation: strip non-digits, strip leading +91 or 0, validate 10-digit starting with 6-9.

Email Addresses

Standard email format: local@domain.tld. Indian emails follow the same format as global emails — no India-specific pattern. However, certain domains are Indian-specific: company.in, college.ac.in, government .gov.in, .nic.in.

DPDP compliance: email is directly identifying and typically used for account access. While not as uniquely sensitive as Aadhaar or PAN, email combined with other data creates a rich personal profile. Redact in LLM prompts and logs. Detection: standard RFC 5322 email regex — be aware of false positives in code samples and config files.

Voter ID (EPIC Number)

Format: 10 characters — 3 uppercase letters followed by 7 digits (e.g., ABC1234567). Issued by Election Commission of India. Full name: Elector Photo Identity Card (EPIC).

DPDP compliance: Voter ID is a government ID that uniquely identifies an Indian citizen. Contains name, address, and constituency — a rich identity document. Should be treated with same sensitivity as Passport. Detection: [A-Z]{3}[0-9]{7} pattern.

Passport Number

Format: 8 characters — 1 uppercase letter followed by 7 digits (A1234567). Issued by Ministry of External Affairs. Contains: full name, date of birth, nationality, place of birth.

DPDP compliance: Passport is the gold-standard identity document. Highest re-identification risk in combination with other data. In KYC flows, treat passport scans as the most sensitive PII category. Never log passport numbers. Detection: [A-Z][0-9]{7}.

Driving Licence

Format: state code (2 letters) + RTO code (2 digits) + year (4 digits) + 7 digits (e.g., MH01201201234567). Issued by Regional Transport Offices. Contains: name, address, date of birth, vehicle classes.

DPDP compliance: Driving licence is commonly used as a KYC document for non-Aadhaar identity verification. Contains address data in addition to identity data. Detection: Indian DL format varies by state but follows [A-Z]{2}[0-9]{2}[0-9]{4}[0-9]{7} pattern.

GST Number (GSTIN)

Format: 15 characters — 2-digit state code + 10-character PAN + 1 entity number + Z + 1 checksum (e.g., 29ABCDE1234F1Z5). GSTIN contains the PAN of the business owner — making it a combined business + individual identifier.

DPDP compliance: GSTIN reveals the PAN of the business owner/partner. In B2B contexts where individual proprietors or partners are involved, GSTIN is personal data. Detection: [0-9]{2}[A-Z]{5}[0-9]{4}[A-Z][0-9][A-Z][0-9].

Bank Account Number

Format: variable length, 9–18 digits. Each Indian bank has its own format. In combination with IFSC and name, uniquely identifies an individual's financial account.

DPDP compliance: bank account numbers enable financial fraud. Should be masked everywhere except in secure banking interfaces (and even there, only last 4 digits displayed). Detection challenge: no universal pattern — detect by context (adjacent IFSC code, account type keywords like 'savings account number', 'current account').

Date of Birth

Format: DD/MM/YYYY, DD-MM-YYYY, YYYY-MM-DD, various. By itself, date of birth is quasi-identifying (not unique). Combined with name, it becomes a key identity anchor point.

DPDP compliance: date of birth is used for identity verification, KYC, and as an authentication factor. It's also required to assess whether DPDP Section 9 (children's data) applies. Should be masked in logs and limited to access-need contexts. Detecting DoB in LLM prompts: look for date patterns adjacent to keywords like 'born', 'DOB', 'date of birth'.

Biometric Data

Types: fingerprints, iris scans, facial recognition vectors, voice prints, gait patterns. India-specific: Aadhaar biometric authentication uses fingerprint and iris. Financial services use fingerprint-based EKYC. Attendance systems use fingerprint or facial recognition widely.

DPDP compliance: biometric data is inherently permanent and uniquely identifying. A leaked fingerprint or facial vector cannot be 'changed' like a password. Under DPDP, biometric data requires the highest level of protection — explicit specific consent, strict purpose limitation, no cross-border transfer except to adequately protected destinations, and encryption at rest and in transit always.

Health and Medical Data

Types: diagnosis codes (ICD-10), medication names, medical record numbers, health insurance IDs (ABHA), lab results, imaging reports. In India, the ABHA (Ayushman Bharat Health Account) number is a 14-digit health ID linked to the ABDM ecosystem.

DPDP compliance: health data is among the most sensitive categories — it affects insurance eligibility, employment, and personal relationships. Highly re-identifiable. LLM applications in healthcare must apply maximum redaction. Detect ABHA: 14-digit numbers in health contexts. Detect medical data by vocabulary: diagnosis names, medication names, ICD codes.

Location Data

Types: GPS coordinates, IP addresses (geo-resolved), home/work address, delivery address. Indian-specific: Pincode (6 digits, identifies postal area) combined with street address is precise location data.

DPDP compliance: precise location data reveals home address, workplace, religious institutions visited, medical facilities visited — highly sensitive in combination. GPS history is especially sensitive. For LLM applications: redact precise coordinates and full addresses from prompts; retain only the city/state level if needed for context.

Financial Transaction Data

Types: UPI transaction IDs, credit/debit card numbers (15-16 digits, Luhn algorithm), CVV, expiry dates, transaction amounts with merchant names. Credit/debit card numbers are additionally covered by PCI-DSS requirements.

DPDP compliance: transaction data in combination reveals spending patterns, financial capacity, and personal relationships. Card numbers are actionable for fraud without additional data. Detection: Luhn-algorithm-validated 13-19 digit sequences for card numbers. UPI transaction IDs: UPI reference numbers follow various formats per bank.

Social Media Identifiers

Types: Twitter/X handles, Instagram usernames, WhatsApp numbers, LinkedIn profiles, Facebook UIDs. In India, WhatsApp is the primary messaging platform — WhatsApp number = mobile number, already covered by mobile PII.

DPDP compliance: social media handles are publicly visible but are personal data — combining a handle with other data can create a comprehensive personal profile. For LLM applications: handles in customer support contexts often indicate the customer wants to be contacted, making them contact data that should be protected.

Employment and Salary Data

Types: employer name, employee ID, CTC (cost to company), salary slip details, Form 16 data, EPF (Employee Provident Fund) account number, gratuity, designations, performance ratings.

DPDP compliance: salary and employment data is deeply personal and discriminatory use is common. In India, salary history is frequently requested by employers — DPDP doesn't explicitly ban this but purpose limitation principles apply. EPF account number (12-digit) linked to Aadhaar is highly sensitive. Detection: look for salary-adjacent vocabulary ('CTC', 'LPA', 'per annum') with numeric values.

Caste, Religion, and Political Data

DPDP doesn't create a separate 'sensitive categories' tier like GDPR's Article 9, but caste, religious affiliation, and political views are clearly personal data that could be used in discriminatory ways.

Compliance consideration: for AI systems making recommendations or decisions, using caste or religious data as a feature is legally risky under India's anti-discrimination laws even if DPDP doesn't explicitly call it out. AI governance policy should explicitly prohibit using these data points in decisioning models.

Data Types pillar implementation addendum

A pillar page should also connect the legal idea to a concrete implementation path. Start with ownership: name the product owner, engineering owner, security reviewer, and compliance reviewer for this topic. Then map the systems that can create, store, transform, or transmit the relevant personal data. The map should include frontend forms, backend APIs, queues, warehouses, LLM prompts, embedding stores, admin exports, vendor dashboards, and customer-success tooling.

Next, document the lawful purpose and the user-facing notice. The notice should be clear enough that a data principal understands what is processed, why AI may be involved, what categories of personal data are affected, and how consent or withdrawal works. If the workflow supports children, healthcare, financial services, employment, or government delivery, treat that context as higher risk and add stricter review before allowing personal data into model calls.

The engineering control should run before data leaves the application boundary. Scan the full prompt package, not just the user's message. That means system instructions, retrieved snippets, tool outputs, attachments, OCR text, chat history, and structured JSON all need inspection. When a high-confidence identifier is found, redact, tokenise, block, or route to a safer model depending on the policy. Keep the original sensitive value out of general logs unless a protected exception is approved.

Audit evidence should be designed for reconstruction. A reviewer should be able to answer: when did the request happen, which application sent it, which data type was detected, which rule fired, what action was taken, which provider received the final payload, and who approved any exception. Without that trail, teams are left with policy claims rather than proof. With it, they can respond faster to buyer diligence, internal audits, breach triage, and regulator questions.

Finally, make the process repeatable. Add sample payloads to tests, run scheduled scans against logs and representative documents, check sitemap and page health for public guidance, and keep the DPDP scanner linked from the page so readers can move from learning to action. The goal is not to freeze the system; it is to make every future AI workflow easier to review, safer to launch, and easier to explain.

#PII#India#Aadhaar#PAN#UPI#DPDP#data types

Check your own workflow

Run a free DPDP scan before this risk reaches production.

Scan prompts, logs, documents, and API payloads for Indian PII exposure, missing redaction, and audit gaps. Backlinks: learn hub, developer docs, pricing, and the DPDP scanner.