This reverses OpenAI's 1.5B Privacy Filter model to extract structured PII spans instead of masking them. You get label, text, and character offsets for emails, phones, names, account numbers, secrets, and five other types. The hybrid backend combines model predictions with regex backstops and hits 0.929 F1 on their fixtures, which is solid for production use. Runs on CPU in about 600ms per parse, faster on GPU. The architecture is clean: BIOES tagging into Viterbi decoding, then span merging to handle multi-token names, then regex catch for things the model misses like API tokens. First run downloads 3GB of weights. If you need to find PII before you redact it, or audit what's hiding in logs and databases, this does the job without reinventing NER from scratch.
npx skills add https://github.com/aradotso/trending-skills --skill privacy-parser-pii-extraction