Three slices, built and tested one at a time. Each slice delivers something usable.
The core — replace known personal data with tokens and reverse it back.
| Task | What it does | File |
|---|---|---|
| 1.1 | Project setup — dependencies, .gitignore | pyproject.toml, .gitignore |
| 1.2 | Token map — bidirectional lookup table (real ↔ token) | token_map.py |
| 1.3 | Config loader — read and validate YAML personal data file | config.py |
| 1.4 | Phase 1 — replace all known PII with tokens | phase1.py |
| 1.5 | De-obfuscation — reverse tokens back to real values | deobfuscate.py |
Done when: You can feed in text containing your known personal data, get back a clean version with tokens, then reverse it perfectly.
The safety net — catch personal data that isn't in your config.
| Task | What it does | File |
|---|---|---|
| 2.1 | Phase 2 — Presidio NER detection of unknown PII | phase2.py |
| 2.2 | Confidence threshold tuning — only replace when confident | phase2.py |
Done when: Unknown names, phone numbers, and locations are caught automatically. Things that aren't PII are left alone.
Wire everything together into a single, easy-to-use interface.
| Task | What it does | File |
|---|---|---|
| 3.1 | Shield orchestrator — single entry point for obfuscate/deobfuscate | shield.py |
| 3.2 | Review mode — show what would be changed before sending | shield.py |
Done when: End-to-end flow works — input text with mixed known/unknown PII goes through both phases, AI response comes back de-obfuscated with all real values restored. Review mode shows a clear before/after.