Duplicate detection rules in Dataverse

How Dataverse's duplicate detection works — rules, matching algorithms, behaviour on create/update, and the maintenance discipline that keeps data clean.

Updated 2026-11-28

Duplicate records are the death of a CRM. The same customer captured three times produces fractured pipeline reporting, conflicting account ownership, repeated outreach to the same contact, and angry sales reps. Dataverse's duplicate detection mechanism catches duplicates at creation time and during data imports, giving users the chance to merge or recognise existing records.

The model. Duplicate detection rules define what counts as a duplicate. Each rule:

Targets a base entity — Account, Contact, Lead, or any custom table.
Compares to a matching entity — usually the same table, sometimes a different one (Account vs Lead for lead-to-account matching).
Conditions — field-by-field criteria for what makes records "duplicates":
- Exact match.
- Same first N characters.
- Same last N characters.
- Same N characters (case-insensitive).
- Phonetic match (sounds-like).
- Specific other operators.

A rule can have multiple conditions combined — e.g. "Email Address exact match AND First Name same first 3 characters" identifies duplicates with very high confidence.

Sample rules.

Account by name — exact match on Account Name. Generous; many companies have similar names.
Account by primary email — exact match on Primary Email. Stricter; if two records share an email, they're the same.
Contact by email — exact match on Email. The most reliable contact-matching rule.
Contact by name — fuzzy: First Name + Last Name + Phone match.
Lead by company + email — exact match on Company Name AND Email.

Behaviour on create. When a user (or an integration) creates a new record, the system evaluates active duplicate detection rules. If a match is found:

Synchronous (form) — a warning dialog appears with the matched record(s) and options: keep both, abandon the new record, edit the new record.
Asynchronous (API / import) — the record creation either fails (default behaviour) or proceeds with a flag, depending on the API call's SuppressDuplicateDetection parameter.

Behaviour on update. Updates can also trigger duplicate detection — saving a record can find that the updated record now duplicates an existing one. Configurable per rule whether to run on update or only on create.

Bulk duplicate detection. Beyond inline detection at create time, duplicate detection jobs can scan an entire table against a rule retrospectively — identifying existing duplicates not caught during creation. The job produces a list of duplicate pairs; users review and merge.

Merging. Dataverse's merge function takes two records and combines them — preserving one as the master, transferring related records (activities, opportunities, cases) from the duplicate to the master, then deleting the duplicate. Field-by-field, the user chooses which value wins (usually master's). Audit trail records the merge.

Limits.

Performance — too many active rules slow record creation. Tune to the rules that matter; retire what's noise.
No real-time deduplication during high-volume imports — large data imports with duplicate detection enabled can be substantially slower than imports without. For initial data migration, often the cleaner pattern is to deduplicate the source data, import without detection, then enable detection going forward.
Cross-table matching limitations — matching a Lead against existing Contacts is supported but the configuration is fiddly.
No fuzzy-string matching beyond the built-in operators — for sophisticated matching, use AI Builder or external deduplication tools.

Customer Insights – Data alternative. For organisations needing serious deduplication across many systems, Customer Insights – Data does enterprise-grade identity resolution with ML-assisted matching across multi-source data. Dataverse's duplicate detection is good for in-Dataverse hygiene; CI–Data is for cross-system unified-customer-profile work.

Operational discipline.

Define rules early — before users start creating records in earnest. Retroactive deduplication is expensive.
Tune rules with real data — too-strict rules miss legitimate duplicates; too-loose rules block legitimate distinct records (two genuinely different customers with similar names).
Run periodic duplicate detection jobs — monthly or quarterly to catch what slipped through.
Review and merge regularly — duplicates that exist but aren't merged are noise.
Train users — when the warning dialog appears, "keep both" is rarely the right answer. Train users to recognise and merge.

Common pitfalls.

Disabled for performance — admins turn off duplicate detection during a migration and never re-enable.
Wrong rules for the data — rules don't match the real-world overlap patterns; duplicates accumulate undetected.
Merge cascades not understood — users panic when a merge deletes a record without realising the activities and opportunities transferred. Train.

Operational reality. Data hygiene compounds. Five minutes daily of duplicate management beats hours of forensic clean-up quarterly.

Related guides

← All guides Glossary →