glossary
7 min readadvanced

Verhoeff Checksum

The mathematical algorithm that separates real Aadhaar numbers from random 12-digit sequences — reducing false positives by over 90%

Key Takeaways

  • 1The Verhoeff algorithm detects all single-digit substitution errors and all adjacent transposition errors in numeric sequences
  • 2Aadhaar's 12th digit is a check digit computed from the first 11 using Verhoeff — enabling mathematical validation
  • 3Regex-only Aadhaar detection has 40-60% false positives; adding Verhoeff drops this to under 5%
  • 4The algorithm uses three mathematical tables (multiplication, permutation, inverse) for O(n) validation

What Is the Verhoeff Checksum?

The Verhoeff checksum is a mathematical error-detection algorithm invented by Dutch mathematician Jacobus Verhoeff in 1969. Unlike simpler checksum methods (like Luhn), it detects all single-digit substitution errors AND all adjacent transposition errors — the two most common types of human data entry mistakes.

India's Unique Identification Authority (UIDAI) chose the Verhoeff algorithm for Aadhaar numbers specifically because of this superior error detection. The 12th digit of every Aadhaar number is a check digit computed from the first 11 digits using the Verhoeff algorithm.

For AI governance, this means we can mathematically validate whether a 12-digit number is likely a real Aadhaar number — not just any random sequence of digits. This single validation step eliminates the majority of false positives in PII detection.

Why Regex Alone Fails for Aadhaar Detection

The naive approach to Aadhaar detection is matching any 12-digit number. Here's why that produces unacceptable results in production:

Regex-Only (\d{12})

  • Matches timestamps (202605011430)
  • Matches order IDs and invoice numbers
  • Matches phone numbers with country codes
  • Matches random numeric sequences in code
  • 40-60% false positive rate in production
  • Alert fatigue makes teams ignore real detections

Regex + Verhoeff Validation

  • Rejects timestamps (fail checksum)
  • Rejects order IDs (fail checksum)
  • Rejects phone numbers (fail checksum)
  • Rejects random sequences (99.9% fail)
  • Under 5% false positive rate
  • High-confidence alerts that teams trust

How the Algorithm Works

The Verhoeff algorithm operates on three mathematical lookup tables and processes digits from right to left:

1. Multiplication Table (d): A 10×10 table defining a non-commutative multiplication operation on digits 0-9. Unlike normal multiplication, d(a,b) ≠ d(b,a), which is what enables transposition detection.

2. Permutation Table (p): An 8×10 table that applies different permutations to each digit based on its position in the number. This ensures that the same digit in different positions contributes differently to the checksum.

3. Inverse Table (inv): A 10-element table used in the final step to compute the check digit from the accumulated result.

The validation process: iterate through all 12 digits from right to left, applying the permutation based on position, then multiplying into a running total using the d-table. If the final result is 0, the number is valid.

Implementation in PII Detection Pipelines

Here's how Verhoeff validation fits into a production PII detection pipeline:

// Detection Pipeline for Aadhaar Numbers

// Step 1: Extract candidates via regex
candidates = regex_match(text, /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/)

// Step 2: Normalize (remove spaces/hyphens)
normalized = candidates.map(c => c.replace(/[\s-]/g, ''))

// Step 3: Verhoeff validation
validated = normalized.filter(n => verhoeff_check(n) === true)

// Step 4: Context scoring (optional)
confident = validated.filter(n => has_identity_context(n, surrounding_text))

// Result: Only mathematically valid Aadhaar numbers proceed to masking

The Verhoeff check itself is O(n) where n is the number of digits — for 12-digit Aadhaar numbers, it's essentially constant time. This adds negligible latency to the detection pipeline.

Error Detection Capabilities

The Verhoeff algorithm's error detection capabilities make it ideal for identity number validation:

100%
Single-digit errors
Detects every case where one digit is changed to another
100%
Adjacent transpositions
Detects every case where two neighboring digits are swapped
98%
Twin errors
Detects most cases where aa→bb (e.g., 11→22)
96%
Jump transpositions
Detects most cases where digits separated by one position are swapped

Production Considerations

Tip

While Verhoeff validation dramatically reduces false positives, it's not perfect. A random 12-digit number has a 1-in-10 chance of passing the Verhoeff check by coincidence. That's why production systems combine Verhoeff with contextual analysis.

Best practice: Use Verhoeff as a fast filter (eliminates 90% of false positives), then apply contextual scoring for the remaining candidates. Look for nearby keywords like 'Aadhaar', 'UID', 'UIDAI', 'आधार', or 'identity number' to boost confidence.

Also consider: Aadhaar numbers starting with 0 or 1 are not issued. This additional format rule eliminates another ~20% of false positives that pass Verhoeff by coincidence.

CrewCheck's Implementation

CrewCheck's PII detection pipeline uses Verhoeff validation as the second stage of a four-stage Aadhaar detection process. After regex extraction identifies candidates, Verhoeff eliminates ~90% of false positives in under 0.1ms per candidate.

The remaining candidates are scored using contextual analysis — examining surrounding text for identity-related keywords in English, Hindi, and regional languages. This multi-stage approach achieves 99.7% detection accuracy with under 5% false positives.

The Verhoeff implementation is optimized with pre-computed lookup tables stored in memory, making each validation a simple series of array lookups rather than mathematical computations.

Frequently Asked Questions

Can a random number pass the Verhoeff check?

Yes — there's approximately a 1-in-10 (10%) chance that a random 12-digit number will pass. That's why Verhoeff is combined with contextual analysis and format rules (no leading 0 or 1) for production detection.

Is Verhoeff the same as Luhn (used for credit cards)?

No. Luhn detects all single-digit errors but misses some transposition errors. Verhoeff detects both single-digit errors AND all adjacent transpositions, making it strictly superior for identity number validation.

How fast is Verhoeff validation?

Extremely fast — O(n) with small constant factors. For 12-digit numbers, it's essentially a few array lookups. CrewCheck validates thousands of candidates per millisecond.

#verhoeff-checksum#aadhaar-validation#pii-detection#false-positive-reduction#algorithm

Continue Reading

Deepen your understanding with related concepts

See Verhoeff Checksum in action

Try CrewCheck's live governance demo — paste any text containing Indian PII and watch real-time detection, masking, and audit logging. No sign-up required.