Auto PII Tagging
Auto PII tagging for Sensitive/NonSensitive at the column level is performed based on the two approaches described below.
PII Tagging is only available during Profiler Ingestion
.
Tagging logic
- Column Name Scanner: We validate the column names of the table against a set of regex rules that help us identify common English patterns to identify email addresses, SSN, bank accounts, etc.
- Entity Recognition: If the sample data ingestion is enabled, we'll validate the sample rows against an Entity Recognition engine that will bring up any sensitive information from a list of supported entities. In that case, the
confidence
parameter lets you tune the minimum score required to tag a column asPII.Sensitive
.
Note that if a column is already tagged as PII
, we will ignore its execution.
Troubleshooting
SSL: CERTIFICATE_VERIFY_FAILED
If you see an error similar to:
This is a scenario that we identified on some corporate Windows laptops. The bottom-line here is that the profiler is trying to download the Entity Recognition model but having certificate issues when trying the request.
A solution here is to manually download the model on the ingestion container / Airflow host by running:
If using Docker, you might want to customize the openmetadata-ingestion
image to have this command run there by default.