Appearance
Custom Entities
Custom entities let you redact user-defined keywords or phrases that the ML models may not detect, or that require specific placeholder names.
Use Cases
- Project codenames (e.g., "Operation Phoenix" ->
[PROJECT_1]) - Internal jargon and acronyms
- Words with double meanings in a given context
- Proprietary product or client names
Format
json
{
"custom_entities": [
{"original": "keyword_or_phrase", "placeholder": "PLACEHOLDER_NAME"}
]
}| Field | Type | Description |
|---|---|---|
original | string | The text to search for (case-insensitive) |
placeholder | string | The placeholder label (uppercase recommended) |
Examples
Text
json
{
"text": "The Phoenix project is managed by John Doe.",
"custom_entities": [
{"original": "Phoenix project", "placeholder": "PROJECT"}
]
}Result:
json
{
"anonymized_text": "The [PROJECT_1] is managed by [PERSON_NAME_2].",
"map": {
"Phoenix project": "[PROJECT_1]",
"John Doe": "[PERSON_NAME_2]"
},
"entities": 2
}File Upload (cURL)
bash
curl -X POST https://demo.questa-ai.online/anonymize/pdf \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-F "file=@document.pdf" \
-F 'custom_entities=[{"original":"Confidential","placeholder":"REDACTED"}]'Behaviour
- Case-insensitive: "building", "BUILDING", and "Building" all match.
- Numbered placeholders: Each occurrence increments:
[PROJECT_1],[PROJECT_2], etc. - Overlap handling: If a custom entity overlaps with an ML-detected entity, the custom placeholder takes priority.
- Works across all endpoints: text, PDF, DOCX, CSV, Excel.
- Multi-word phrases: full sentences are matched case-insensitively.
Limitations
- Substring matching means
"art"will match within "article", "party", "smart". Use specific phrases. - Maximum 50 custom entities per request.
- Maximum 200 characters per phrase.
Next: Placeholder Format