Skip to main content

Table 1 Summary of de-identification methods for individual-level data

From: Methods for the de-identification of electronic health records for genomic research

De-identification method

Techniques

Details

Masking (applied to direct identifiers)

Suppression/redaction

Direct identifiers are removed from the data or replaced with tags

 

Random replacement/randomization

Direct identifiers are replaced with randomly chosen values (for example, for names and medical record numbers)

 

Pseudonymization

Unique numbers that are not reversible replace direct identifiers

Generalization (applied to quasi-identifiers)

Hierarchy-based generalization

Generalization is based on a predefined hierarchy describing how precision on quasi-identifiers is reduced

 

Cluster-based generalization

Individual transactions are empirically grouped or based on pre-defined utility policies

Suppression (applied to records flagged for suppression)

Casewise deletion

The full record is deleted

 

Quasi-identifier deletion

Only the quasi-identifiers are deleted

 

Local cell suppression

Optimization scheme is applied to the quasi-identifiers to suppress the fewest values but ensure a re-identification probability below the threshold