Advanced Privacy Preservation: Implementing L-Diversity and T-Closeness in Java

While k-anonymity protects against identity disclosure, it's vulnerable to attribute disclosure attacks. L-Diversity and T-Closeness are advanced privacy models that provide stronger protection by ensuring diversity and distribution similarity in sensitive attributes within anonymized data groups.

Understanding the Privacy Models

Privacy Model Evolution:

  • k-Anonymity: Each group has at least k records (protects identity)
  • L-Diversity: Each group has at least L distinct sensitive values (protects attributes)
  • T-Closeness: Distribution of sensitive values in each group is within distance T of overall distribution (strongest protection)

Core Data Structures and Domain Models

1. Anonymized Dataset Structure

public class AnonymizedDataset {
private final List<QuasiIdentifier> quasiIdentifiers;
private final SensitiveAttribute sensitiveAttribute;
private final List<EquivalenceClass> equivalenceClasses;
private final PrivacyModel privacyModel;
public AnonymizedDataset(List<QuasiIdentifier> quasiIdentifiers, 
SensitiveAttribute sensitiveAttribute) {
this.quasiIdentifiers = quasiIdentifiers;
this.sensitiveAttribute = sensitiveAttribute;
this.equivalenceClasses = new ArrayList<>();
this.privacyModel = PrivacyModel.NONE;
}
public void addEquivalenceClass(EquivalenceClass eqClass) {
equivalenceClasses.add(eqClass);
}
public boolean satisfiesLDiversity(int l) {
return equivalenceClasses.stream()
.allMatch(eq -> eq.getDistinctSensitiveValues().size() >= l);
}
public boolean satisfiesTCloseness(double t, Distribution globalDistribution) {
return equivalenceClasses.stream()
.allMatch(eq -> eq.calculateEarthMoversDistance(globalDistribution) <= t);
}
// Getters
public List<EquivalenceClass> getEquivalenceClasses() { return equivalenceClasses; }
}
public enum PrivacyModel {
NONE, K_ANONYMITY, L_DIVERSITY, T_CLOSENESS
}

2. Equivalence Class Implementation

public class EquivalenceClass {
private final Map<String, Object> quasiIdentifierValues;
private final List<Record> records;
private final SensitiveAttribute sensitiveAttribute;
public EquivalenceClass(Map<String, Object> quasiIdentifierValues, 
SensitiveAttribute sensitiveAttribute) {
this.quasiIdentifierValues = new LinkedHashMap<>(quasiIdentifierValues);
this.records = new ArrayList<>();
this.sensitiveAttribute = sensitiveAttribute;
}
public void addRecord(Record record) {
records.add(record);
}
public Set<Object> getDistinctSensitiveValues() {
return records.stream()
.map(record -> record.getAttributeValue(sensitiveAttribute.getName()))
.collect(Collectors.toSet());
}
public int getDistinctSensitiveCount() {
return getDistinctSensitiveValues().size();
}
public double calculateEarthMoversDistance(Distribution globalDistribution) {
Distribution localDistribution = calculateLocalDistribution();
return EarthMoversDistanceCalculator.calculate(localDistribution, globalDistribution);
}
public Distribution calculateLocalDistribution() {
Map<Object, Double> probabilities = new HashMap<>();
long totalRecords = records.size();
// Count frequency of each sensitive value
Map<Object, Long> frequency = records.stream()
.collect(Collectors.groupingBy(
record -> record.getAttributeValue(sensitiveAttribute.getName()),
Collectors.counting()
));
// Convert to probabilities
for (Map.Entry<Object, Long> entry : frequency.entrySet()) {
probabilities.put(entry.getKey(), (double) entry.getValue() / totalRecords);
}
return new Distribution(probabilities, sensitiveAttribute);
}
public boolean satisfiesLDiversity(int l) {
return getDistinctSensitiveCount() >= l;
}
// Getters
public List<Record> getRecords() { return new ArrayList<>(records); }
public int size() { return records.size(); }
}

3. Data Record and Attribute Models

public class Record {
private final String id;
private final Map<String, Object> attributes;
public Record(String id) {
this.id = id;
this.attributes = new HashMap<>();
}
public void setAttribute(String name, Object value) {
attributes.put(name, value);
}
public Object getAttributeValue(String name) {
return attributes.get(name);
}
public Map<String, Object> getQuasiIdentifierValues(List<QuasiIdentifier> quasiIdentifiers) {
return quasiIdentifiers.stream()
.collect(Collectors.toMap(
QuasiIdentifier::getName,
qi -> attributes.get(qi.getName())
));
}
// Getters
public String getId() { return id; }
public Map<String, Object> getAttributes() { return new HashMap<>(attributes); }
}
public class QuasiIdentifier {
private final String name;
private final DataType dataType;
private final GeneralizationHierarchy hierarchy;
public QuasiIdentifier(String name, DataType dataType, GeneralizationHierarchy hierarchy) {
this.name = name;
this.dataType = dataType;
this.hierarchy = hierarchy;
}
// Getters
public String getName() { return name; }
public DataType getDataType() { return dataType; }
public GeneralizationHierarchy getHierarchy() { return hierarchy; }
}
public class SensitiveAttribute {
private final String name;
private final DataType dataType;
private final boolean isCategorical;
public SensitiveAttribute(String name, DataType dataType, boolean isCategorical) {
this.name = name;
this.dataType = dataType;
this.isCategorical = isCategorical;
}
// Getters
public String getName() { return name; }
public boolean isCategorical() { return isCategorical; }
}
public enum DataType {
INTEGER, STRING, DATE, NUMERIC, CATEGORICAL
}

L-Diversity Implementation

1. L-Diversity Anonymizer

public class LDiversityAnonymizer {
private final int l;
private final List<QuasiIdentifier> quasiIdentifiers;
private final SensitiveAttribute sensitiveAttribute;
public LDiversityAnonymizer(int l, List<QuasiIdentifier> quasiIdentifiers, 
SensitiveAttribute sensitiveAttribute) {
if (l < 1) {
throw new IllegalArgumentException("L must be at least 1");
}
this.l = l;
this.quasiIdentifiers = quasiIdentifiers;
this.sensitiveAttribute = sensitiveAttribute;
}
public AnonymizedDataset anonymize(List<Record> records) {
AnonymizedDataset dataset = new AnonymizedDataset(quasiIdentifiers, sensitiveAttribute);
// Group records by quasi-identifier values
Map<Map<String, Object>, List<Record>> groups = records.stream()
.collect(Collectors.groupingBy(
record -> record.getQuasiIdentifierValues(quasiIdentifiers)
));
// Create equivalence classes and check L-Diversity
for (Map.Entry<Map<String, Object>, List<Record>> entry : groups.entrySet()) {
EquivalenceClass eqClass = new EquivalenceClass(entry.getKey(), sensitiveAttribute);
entry.getValue().forEach(eqClass::addRecord);
// Only add if it satisfies L-Diversity
if (eqClass.satisfiesLDiversity(l)) {
dataset.addEquivalenceClass(eqClass);
} else {
// Handle non-diverse groups (split or generalize further)
handleNonDiverseGroup(eqClass, dataset);
}
}
return dataset;
}
private void handleNonDiverseGroup(EquivalenceClass eqClass, AnonymizedDataset dataset) {
if (eqClass.size() < l) {
// Too few records - suppress or merge
suppressGroup(eqClass);
} else {
// Enough records but not diverse - apply generalization
generalizeAndSplit(eqClass, dataset);
}
}
private void suppressGroup(EquivalenceClass eqClass) {
// Remove group from dataset (suppression)
System.out.println("Suppressing group with " + eqClass.size() + " records");
}
private void generalizeAndSplit(EquivalenceClass eqClass, AnonymizedDataset dataset) {
// Apply generalization to quasi-identifiers and split
List<EquivalenceClass> splitClasses = applyGeneralization(eqClass);
for (EquivalenceClass splitClass : splitClasses) {
if (splitClass.satisfiesLDiversity(l)) {
dataset.addEquivalenceClass(splitClass);
} else {
// Recursively handle until L-Diversity is satisfied
handleNonDiverseGroup(splitClass, dataset);
}
}
}
private List<EquivalenceClass> applyGeneralization(EquivalenceClass eqClass) {
// Implement generalization logic
// This is a simplified version
List<EquivalenceClass> result = new ArrayList<>();
// Example: generalize one quasi-identifier and split
if (!quasiIdentifiers.isEmpty()) {
QuasiIdentifier toGeneralize = quasiIdentifiers.get(0);
Map<Object, List<Record>> generalizedGroups = generalizeAttribute(
eqClass, toGeneralize);
for (Map.Entry<Object, List<Record>> entry : generalizedGroups.entrySet()) {
Map<String, Object> newQIValues = new HashMap<>(eqClass.getQuasiIdentifierValues());
newQIValues.put(toGeneralize.getName(), entry.getKey());
EquivalenceClass newClass = new EquivalenceClass(newQIValues, sensitiveAttribute);
entry.getValue().forEach(newClass::addRecord);
result.add(newClass);
}
}
return result;
}
private Map<Object, List<Record>> generalizeAttribute(EquivalenceClass eqClass, 
QuasiIdentifier quasiIdentifier) {
// Implement attribute generalization using hierarchy
Map<Object, List<Record>> groups = new HashMap<>();
for (Record record : eqClass.getRecords()) {
Object originalValue = record.getAttributeValue(quasiIdentifier.getName());
Object generalizedValue = quasiIdentifier.getHierarchy()
.generalize(originalValue, 1); // Generalize one level
groups.computeIfAbsent(generalizedValue, k -> new ArrayList<>()).add(record);
}
return groups;
}
// Analysis methods
public LDiversityAnalysis analyzeDataset(AnonymizedDataset dataset) {
int totalGroups = dataset.getEquivalenceClasses().size();
int compliantGroups = (int) dataset.getEquivalenceClasses().stream()
.filter(eq -> eq.satisfiesLDiversity(l))
.count();
double complianceRate = (double) compliantGroups / totalGroups;
return new LDiversityAnalysis(l, totalGroups, compliantGroups, complianceRate);
}
}
public class LDiversityAnalysis {
private final int l;
private final int totalGroups;
private final int compliantGroups;
private final double complianceRate;
public LDiversityAnalysis(int l, int totalGroups, int compliantGroups, double complianceRate) {
this.l = l;
this.totalGroups = totalGroups;
this.compliantGroups = compliantGroups;
this.complianceRate = complianceRate;
}
// Getters and toString
public double getComplianceRate() { return complianceRate; }
public boolean isFullyCompliant() { return complianceRate == 1.0; }
}

2. Enhanced L-Diversity Variants

public class EnhancedLDiversityAnonymizer extends LDiversityAnonymizer {
public EnhancedLDiversityAnonymizer(int l, List<QuasiIdentifier> quasiIdentifiers, 
SensitiveAttribute sensitiveAttribute) {
super(l, quasiIdentifiers, sensitiveAttribute);
}
public boolean satisfiesEntropyLDiversity(EquivalenceClass eqClass, double minEntropy) {
Distribution distribution = eqClass.calculateLocalDistribution();
double entropy = calculateEntropy(distribution);
return entropy >= minEntropy;
}
public boolean satisfiesRecursiveLDiversity(EquivalenceClass eqClass, double c) {
List<Object> sensitiveValues = eqClass.getRecords().stream()
.map(record -> record.getAttributeValue(getSensitiveAttribute().getName()))
.collect(Collectors.toList());
// Count frequencies
Map<Object, Long> frequencies = sensitiveValues.stream()
.collect(Collectors.groupingBy(v -> v, Collectors.counting()));
long total = sensitiveValues.size();
long maxFrequency = frequencies.values().stream()
.mapToLong(Long::longValue)
.max()
.orElse(0L);
// Recursive (c, l)-diversity: max frequency <= c * (total - max frequency)
return maxFrequency <= c * (total - maxFrequency);
}
private double calculateEntropy(Distribution distribution) {
double entropy = 0.0;
for (Double probability : distribution.getProbabilities().values()) {
if (probability > 0) {
entropy -= probability * Math.log(probability);
}
}
return entropy;
}
}

T-Closeness Implementation

1. Distribution and Distance Metrics

public class Distribution {
private final Map<Object, Double> probabilities;
private final SensitiveAttribute sensitiveAttribute;
public Distribution(Map<Object, Double> probabilities, SensitiveAttribute sensitiveAttribute) {
this.probabilities = new HashMap<>(probabilities);
this.sensitiveAttribute = sensitiveAttribute;
normalize();
}
private void normalize() {
double total = probabilities.values().stream().mapToDouble(Double::doubleValue).sum();
if (total > 0 && Math.abs(total - 1.0) > 1e-10) {
probabilities.replaceAll((k, v) -> v / total);
}
}
public double getProbability(Object value) {
return probabilities.getOrDefault(value, 0.0);
}
public Set<Object> getValues() {
return probabilities.keySet();
}
// Getters
public Map<Object, Double> getProbabilities() { return new HashMap<>(probabilities); }
public SensitiveAttribute getSensitiveAttribute() { return sensitiveAttribute; }
}
public class EarthMoversDistanceCalculator {
public static double calculate(Distribution dist1, Distribution dist2) {
// For categorical attributes with hierarchical distance
if (dist1.getSensitiveAttribute().isCategorical()) {
return calculateCategoricalEMD(dist1, dist2);
} else {
// For numerical attributes
return calculateNumericalEMD(dist1, dist2);
}
}
private static double calculateCategoricalEMD(Distribution dist1, Distribution dist2) {
// Simplified EMD for categorical data
// In practice, you'd need a ground distance matrix between categories
double totalDistance = 0.0;
Set<Object> allValues = new HashSet<>();
allValues.addAll(dist1.getValues());
allValues.addAll(dist2.getValues());
for (Object value : allValues) {
double diff = Math.abs(dist1.getProbability(value) - dist2.getProbability(value));
totalDistance += diff;
}
return totalDistance / 2.0; // Normalize
}
private static double calculateNumericalEMD(Distribution dist1, Distribution dist2) {
// For numerical data, we can use the actual values
// This implementation assumes the sensitive attribute is numerical
List<Double> values1 = getOrderedValues(dist1);
List<Double> values2 = getOrderedValues(dist2);
double emd = 0.0;
double cumulative1 = 0.0;
double cumulative2 = 0.0;
int i = 0, j = 0;
while (i < values1.size() && j < values2.size()) {
double val1 = values1.get(i);
double val2 = values2.get(j);
double prob1 = dist1.getProbability(val1);
double prob2 = dist2.getProbability(val2);
double minProb = Math.min(prob1 - cumulative1, prob2 - cumulative2);
emd += minProb * Math.abs(val1 - val2);
cumulative1 += minProb;
cumulative2 += minProb;
if (cumulative1 >= prob1 - 1e-10) {
i++;
cumulative1 = 0.0;
}
if (cumulative2 >= prob2 - 1e-10) {
j++;
cumulative2 = 0.0;
}
}
return emd;
}
private static List<Double> getOrderedValues(Distribution dist) {
return dist.getValues().stream()
.map(val -> Double.parseDouble(val.toString()))
.sorted()
.collect(Collectors.toList());
}
}

2. T-Closeness Anonymizer

public class TClosenessAnonymizer {
private final double t;
private final List<QuasiIdentifier> quasiIdentifiers;
private final SensitiveAttribute sensitiveAttribute;
private Distribution globalDistribution;
public TClosenessAnonymizer(double t, List<QuasiIdentifier> quasiIdentifiers, 
SensitiveAttribute sensitiveAttribute) {
if (t < 0 || t > 1) {
throw new IllegalArgumentException("T must be between 0 and 1");
}
this.t = t;
this.quasiIdentifiers = quasiIdentifiers;
this.sensitiveAttribute = sensitiveAttribute;
}
public AnonymizedDataset anonymize(List<Record> records) {
// Calculate global distribution
this.globalDistribution = calculateGlobalDistribution(records);
AnonymizedDataset dataset = new AnonymizedDataset(quasiIdentifiers, sensitiveAttribute);
// Start with k-anonymity grouping
Map<Map<String, Object>, List<Record>> groups = records.stream()
.collect(Collectors.groupingBy(
record -> record.getQuasiIdentifierValues(quasiIdentifiers)
));
// Process each group for T-Closeness
for (Map.Entry<Map<String, Object>, List<Record>> entry : groups.entrySet()) {
EquivalenceClass eqClass = new EquivalenceClass(entry.getKey(), sensitiveAttribute);
entry.getValue().forEach(eqClass::addRecord);
if (satisfiesTCloseness(eqClass)) {
dataset.addEquivalenceClass(eqClass);
} else {
handleNonCloseGroup(eqClass, dataset);
}
}
return dataset;
}
private Distribution calculateGlobalDistribution(List<Record> records) {
Map<Object, Long> frequency = records.stream()
.collect(Collectors.groupingBy(
record -> record.getAttributeValue(sensitiveAttribute.getName()),
Collectors.counting()
));
Map<Object, Double> probabilities = new HashMap<>();
long total = records.size();
for (Map.Entry<Object, Long> entry : frequency.entrySet()) {
probabilities.put(entry.getKey(), (double) entry.getValue() / total);
}
return new Distribution(probabilities, sensitiveAttribute);
}
private boolean satisfiesTCloseness(EquivalenceClass eqClass) {
double emd = eqClass.calculateEarthMoversDistance(globalDistribution);
return emd <= t;
}
private void handleNonCloseGroup(EquivalenceClass eqClass, AnonymizedDataset dataset) {
// Strategy 1: Try to split the group
List<EquivalenceClass> splitClasses = splitForTCloseness(eqClass);
boolean allSatisfy = true;
for (EquivalenceClass splitClass : splitClasses) {
if (satisfiesTCloseness(splitClass)) {
dataset.addEquivalenceClass(splitClass);
} else {
allSatisfy = false;
// Try further splitting or suppression
if (splitClass.size() > 1) {
handleNonCloseGroup(splitClass, dataset);
} else {
suppressGroup(splitClass);
}
}
}
// If splitting didn't work, try merging with similar groups
if (!allSatisfy && splitClasses.size() > 1) {
tryMerging(splitClasses, dataset);
}
}
private List<EquivalenceClass> splitForTCloseness(EquivalenceClass eqClass) {
// Split based on sensitive attribute values to improve distribution
Map<Object, List<Record>> valueGroups = eqClass.getRecords().stream()
.collect(Collectors.groupingBy(
record -> record.getAttributeValue(sensitiveAttribute.getName())
));
// If we have too many distinct values, group similar values
if (valueGroups.size() > 10) { // Arbitrary threshold
valueGroups = groupSimilarValues(valueGroups);
}
List<EquivalenceClass> result = new ArrayList<>();
for (Map.Entry<Object, List<Record>> entry : valueGroups.entrySet()) {
// Create new equivalence class with same QI values but subset of records
EquivalenceClass newClass = new EquivalenceClass(
eqClass.getQuasiIdentifierValues(), sensitiveAttribute);
entry.getValue().forEach(newClass::addRecord);
result.add(newClass);
}
return result;
}
private Map<Object, List<Record>> groupSimilarValues(Map<Object, List<Record>> valueGroups) {
// Group similar sensitive values (for numerical or hierarchical categorical data)
// This is a simplified implementation
Map<Object, List<Record>> grouped = new HashMap<>();
if (sensitiveAttribute.isCategorical()) {
// For categorical, group by first letter or some other heuristic
valueGroups.forEach((value, records) -> {
String groupKey = value.toString().substring(0, 1); // First character
grouped.computeIfAbsent(groupKey, k -> new ArrayList<>()).addAll(records);
});
} else {
// For numerical, group by ranges
valueGroups.forEach((value, records) -> {
double numValue = Double.parseDouble(value.toString());
String groupKey = String.valueOf(Math.floor(numValue / 10.0) * 10); // Groups of 10
grouped.computeIfAbsent(groupKey, k -> new ArrayList<>()).addAll(records);
});
}
return grouped;
}
private void tryMerging(List<EquivalenceClass> classes, AnonymizedDataset dataset) {
// Try to merge classes to achieve T-Closeness
// This is a greedy merging approach
List<EquivalenceClass> merged = new ArrayList<>(classes);
boolean improved;
do {
improved = false;
for (int i = 0; i < merged.size(); i++) {
for (int j = i + 1; j < merged.size(); j++) {
EquivalenceClass mergedClass = mergeClasses(merged.get(i), merged.get(j));
if (satisfiesTCloseness(mergedClass)) {
merged.set(i, mergedClass);
merged.remove(j);
improved = true;
break;
}
}
if (improved) break;
}
} while (improved);
// Add successfully merged classes
for (EquivalenceClass eqClass : merged) {
if (satisfiesTCloseness(eqClass)) {
dataset.addEquivalenceClass(eqClass);
} else {
suppressGroup(eqClass);
}
}
}
private EquivalenceClass mergeClasses(EquivalenceClass class1, EquivalenceClass class2) {
// Merge two equivalence classes (generalize QI values)
Map<String, Object> mergedQI = generalizeQIValues(
class1.getQuasiIdentifierValues(), 
class2.getQuasiIdentifierValues()
);
EquivalenceClass merged = new EquivalenceClass(mergedQI, sensitiveAttribute);
class1.getRecords().forEach(merged::addRecord);
class2.getRecords().forEach(merged::addRecord);
return merged;
}
private Map<String, Object> generalizeQIValues(Map<String, Object> qi1, Map<String, Object> qi2) {
Map<String, Object> merged = new HashMap<>();
for (String key : qi1.keySet()) {
Object val1 = qi1.get(key);
Object val2 = qi2.get(key);
// Find quasi-identifier and generalize
QuasiIdentifier qi = quasiIdentifiers.stream()
.filter(q -> q.getName().equals(key))
.findFirst()
.orElseThrow();
Object generalized = generalizeValues(val1, val2, qi);
merged.put(key, generalized);
}
return merged;
}
private Object generalizeValues(Object val1, Object val2, QuasiIdentifier qi) {
// Use generalization hierarchy
if (qi.getHierarchy() != null) {
return qi.getHierarchy().findCommonGeneralization(val1, val2);
}
// Fallback: use range for numerical, set for categorical
if (qi.getDataType() == DataType.INTEGER || qi.getDataType() == DataType.NUMERIC) {
double num1 = Double.parseDouble(val1.toString());
double num2 = Double.parseDouble(val2.toString());
double min = Math.min(num1, num2);
double max = Math.max(num1, num2);
return String.format("[%.2f-%.2f]", min, max);
} else {
return Set.of(val1, val2).toString();
}
}
private void suppressGroup(EquivalenceClass eqClass) {
System.out.println("Suppressing group with " + eqClass.size() + " records for T-Closeness");
}
}

Generalization Hierarchies

1. Hierarchy Implementation

public interface GeneralizationHierarchy {
Object generalize(Object value, int levels);
Object specialize(Object value, int levels);
Object findCommonGeneralization(Object value1, Object value2);
int getHeight();
}
public class NumericalRangeHierarchy implements GeneralizationHierarchy {
private final List<Range> levels;
public NumericalRangeHierarchy(List<Range> levels) {
this.levels = new ArrayList<>(levels);
}
@Override
public Object generalize(Object value, int levels) {
if (levels <= 0) return value;
double numValue = Double.parseDouble(value.toString());
Range targetLevel = this.levels.get(Math.min(levels - 1, this.levels.size() - 1));
return targetLevel.contains(numValue) ? targetLevel : value;
}
@Override
public Object specialize(Object value, int levels) {
// Implementation for specialization
return value; // Simplified
}
@Override
public Object findCommonGeneralization(Object value1, Object value2) {
double num1 = Double.parseDouble(value1.toString());
double num2 = Double.parseDouble(value2.toString());
for (Range level : levels) {
if (level.contains(num1) && level.contains(num2)) {
return level;
}
}
// Return the most general level
return levels.get(levels.size() - 1);
}
@Override
public int getHeight() {
return levels.size();
}
}
public class Range {
private final double min;
private final double max;
private final String label;
public Range(double min, double max, String label) {
this.min = min;
this.max = max;
this.label = label;
}
public boolean contains(double value) {
return value >= min && value <= max;
}
@Override
public String toString() {
return label;
}
}
public class CategoricalHierarchy implements GeneralizationHierarchy {
private final Map<Object, Object> parentMap;
private final Map<Object, Integer> levelMap;
private final Object root;
public CategoricalHierarchy(Map<Object, Object> hierarchy, Object root) {
this.parentMap = new HashMap<>(hierarchy);
this.root = root;
this.levelMap = calculateLevels(hierarchy, root);
}
private Map<Object, Integer> calculateLevels(Map<Object, Object> hierarchy, Object root) {
Map<Object, Integer> levels = new HashMap<>();
calculateLevelsRecursive(root, 0, hierarchy, levels);
return levels;
}
private void calculateLevelsRecursive(Object node, int level, 
Map<Object, Object> hierarchy, 
Map<Object, Integer> levels) {
levels.put(node, level);
hierarchy.entrySet().stream()
.filter(entry -> entry.getValue().equals(node))
.forEach(entry -> calculateLevelsRecursive(entry.getKey(), level + 1, hierarchy, levels));
}
@Override
public Object generalize(Object value, int levels) {
Object current = value;
for (int i = 0; i < levels; i++) {
current = parentMap.get(current);
if (current == null) break;
}
return current != null ? current : value;
}
@Override
public Object specialize(Object value, int levels) {
// Not commonly used in anonymization
return value;
}
@Override
public Object findCommonGeneralization(Object value1, Object value2) {
Set<Object> ancestors1 = getAncestors(value1);
Set<Object> ancestors2 = getAncestors(value2);
// Find common ancestors and return the most specific one
return ancestors1.stream()
.filter(ancestors2::contains)
.min(Comparator.comparing(levelMap::get))
.orElse(root);
}
private Set<Object> getAncestors(Object value) {
Set<Object> ancestors = new HashSet<>();
Object current = value;
while (current != null) {
ancestors.add(current);
current = parentMap.get(current);
}
return ancestors;
}
@Override
public int getHeight() {
return levelMap.values().stream().max(Integer::compareTo).orElse(0) + 1;
}
}

Practical Usage Example

1. Complete Anonymization Pipeline

public class PrivacyPreservationDemo {
public static void main(String[] args) {
// Create sample dataset
List<Record> records = createSampleDataset();
// Define quasi-identifiers
List<QuasiIdentifier> quasiIdentifiers = Arrays.asList(
new QuasiIdentifier("age", DataType.INTEGER, createAgeHierarchy()),
new QuasiIdentifier("zipcode", DataType.STRING, createZipcodeHierarchy()),
new QuasiIdentifier("gender", DataType.CATEGORICAL, createGenderHierarchy())
);
// Define sensitive attribute
SensitiveAttribute sensitiveAttribute = new SensitiveAttribute("disease", 
DataType.CATEGORICAL, true);
// Apply L-Diversity
LDiversityAnonymizer lDiversityAnonymizer = new LDiversityAnonymizer(3, 
quasiIdentifiers, sensitiveAttribute);
AnonymizedDataset lDiverseDataset = lDiversityAnonymizer.anonymize(records);
LDiversityAnalysis lAnalysis = lDiversityAnonymizer.analyzeDataset(lDiverseDataset);
System.out.println("L-Diversity Analysis:");
System.out.println("Compliance Rate: " + lAnalysis.getComplianceRate());
System.out.println("Fully Compliant: " + lAnalysis.isFullyCompliant());
// Apply T-Closeness
TClosenessAnonymizer tClosenessAnonymizer = new TClosenessAnonymizer(0.2, 
quasiIdentifiers, sensitiveAttribute);
AnonymizedDataset tCloseDataset = tClosenessAnonymizer.anonymize(records);
System.out.println("\nT-Closeness Result:");
System.out.println("Number of equivalence classes: " + 
tCloseDataset.getEquivalenceClasses().size());
// Analyze information loss
double informationLoss = calculateInformationLoss(tCloseDataset, records);
System.out.println("Information Loss: " + informationLoss);
}
private static List<Record> createSampleDataset() {
List<Record> records = new ArrayList<>();
// Add sample records with age, zipcode, gender, disease
String[] diseases = {"Flu", "Cancer", "Diabetes", "Heart Disease", "Asthma"};
Random random = new Random(42);
for (int i = 0; i < 1000; i++) {
Record record = new Record("ID_" + i);
record.setAttribute("age", 20 + random.nextInt(60)); // 20-79
record.setAttribute("zipcode", "1000" + random.nextInt(100)); // 10000-10099
record.setAttribute("gender", random.nextBoolean() ? "M" : "F");
record.setAttribute("disease", diseases[random.nextInt(diseases.length)]);
records.add(record);
}
return records;
}
private static double calculateInformationLoss(AnonymizedDataset dataset, 
List<Record> originalRecords) {
// Simplified information loss calculation
int totalGeneralized = 0;
int totalAttributes = 0;
for (EquivalenceClass eqClass : dataset.getEquivalenceClasses()) {
for (Record record : eqClass.getRecords()) {
// Compare original vs generalized values
// This is a simplified calculation
totalAttributes += eqClass.getQuasiIdentifierValues().size();
}
}
return (double) totalGeneralized / totalAttributes;
}
}

Enterprise Integration

1. Spring Boot Service

@Service
public class AnonymizationService {
private final DataSourceService dataSourceService;
private final AnonymizationConfigRepository configRepository;
public AnonymizationService(DataSourceService dataSourceService,
AnonymizationConfigRepository configRepository) {
this.dataSourceService = dataSourceService;
this.configRepository = configRepository;
}
@Transactional
public AnonymizationResult anonymizeDataset(String datasetId, String configId) {
AnonymizationConfig config = configRepository.findById(configId)
.orElseThrow(() -> new IllegalArgumentException("Config not found: " + configId));
List<Record> records = dataSourceService.loadRecords(datasetId);
AnonymizedDataset anonymizedDataset;
PrivacyMetrics metrics;
switch (config.getPrivacyModel()) {
case L_DIVERSITY:
anonymizedDataset = applyLDiversity(records, config);
break;
case T_CLOSENESS:
anonymizedDataset = applyTCloseness(records, config);
break;
default:
throw new IllegalArgumentException("Unsupported privacy model");
}
metrics = calculatePrivacyMetrics(anonymizedDataset, config);
dataSourceService.saveAnonymizedDataset(anonymizedDataset, datasetId + "_anonymized");
return new AnonymizationResult(anonymizedDataset, metrics, config);
}
private AnonymizedDataset applyLDiversity(List<Record> records, AnonymizationConfig config) {
LDiversityAnonymizer anonymizer = new LDiversityAnonymizer(
config.getLValue(),
config.getQuasiIdentifiers(),
config.getSensitiveAttribute()
);
return anonymizer.anonymize(records);
}
private AnonymizedDataset applyTCloseness(List<Record> records, AnonymizationConfig config) {
TClosenessAnonymizer anonymizer = new TClosenessAnonymizer(
config.getTValue(),
config.getQuasiIdentifiers(),
config.getSensitiveAttribute()
);
return anonymizer.anonymize(records);
}
private PrivacyMetrics calculatePrivacyMetrics(AnonymizedDataset dataset, 
AnonymizationConfig config) {
// Calculate various privacy and utility metrics
return new PrivacyMetrics(); // Implementation details
}
}

Conclusion

L-Diversity and T-Closeness provide essential protections beyond k-anonymity:

Key Advantages:

  • L-Diversity: Prevents attribute disclosure through diversity requirements
  • T-Closeness: Protects against distribution-based attacks through similarity constraints
  • Flexible Implementation: Supports various data types and hierarchies
  • Configurable Privacy: Adjustable parameters for privacy-utility tradeoffs

Implementation Considerations:

  • Algorithm Selection: Choose based on data sensitivity and use case
  • Hierarchy Design: Critical for effective generalization
  • Performance: T-Closeness with EMD can be computationally intensive
  • Utility Preservation: Balance privacy with data usefulness

Production Requirements:

  • Efficient EMD calculation implementations
  • Scalable grouping algorithms for large datasets
  • Comprehensive metrics for privacy and utility
  • Integration with data pipelines and databases

Java's strong typing, rich collections framework, and mathematical capabilities make it well-suited for implementing these sophisticated privacy preservation techniques in enterprise environments.

Leave a Reply

Your email address will not be published. Required fields are marked *


Macro Nepal Helper