You need to agree to share your contact information to access this dataset

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this dataset content.

👉 Looking for the newest release? The current flagship is ai4privacy/pii-masking-openpii-1m. 1.4M samples, 23 languages, 19 PII classes.

p5y Data Analytics

This dataset is built on the p5y framework - think of it as i18n but for privacy. Just as i18n (internationalization) translates content into different locales, p5y translates sensitive data into privacy-safe formats through a standardized 3-step approach:

  1. Awareness - Scan and markup private entities in unstructured text, producing a structured privacy mask with entity types, distribution, density, and risk assessment.
  2. Protection - Control identified personal data through masking, pseudonymization, or k-anonymization, tailored to the specific use case and regulatory requirements.
  3. Quality Assurance - Measure remaining privacy risk after anonymization, evaluating de-anonymization risks through expert annotation and automated assessment.

Learn more at p5y.org

Downloads last month
13

Collection including ai4privacy/pii-masking-400k-extended