Skip to content

Custom Entities

Custom entities let you redact user-defined keywords or phrases that the ML models may not detect, or that require specific placeholder names.

Use Cases

  • Project codenames (e.g., "Operation Phoenix" -> [PROJECT_1])
  • Internal jargon and acronyms
  • Words with double meanings in a given context
  • Proprietary product or client names

Format

json
{
  "custom_entities": [
    {"original": "keyword_or_phrase", "placeholder": "PLACEHOLDER_NAME"}
  ]
}
FieldTypeDescription
originalstringThe text to search for (case-insensitive)
placeholderstringThe placeholder label (uppercase recommended)

Examples

Text

json
{
  "text": "The Phoenix project is managed by John Doe.",
  "custom_entities": [
    {"original": "Phoenix project", "placeholder": "PROJECT"}
  ]
}

Result:

json
{
  "anonymized_text": "The [PROJECT_1] is managed by [PERSON_NAME_2].",
  "map": {
    "Phoenix project": "[PROJECT_1]",
    "John Doe": "[PERSON_NAME_2]"
  },
  "entities": 2
}

File Upload (cURL)

bash
curl -X POST https://demo.questa-ai.online/anonymize/pdf \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -F "file=@document.pdf" \
  -F 'custom_entities=[{"original":"Confidential","placeholder":"REDACTED"}]'

Behaviour

  • Case-insensitive: "building", "BUILDING", and "Building" all match.
  • Numbered placeholders: Each occurrence increments: [PROJECT_1], [PROJECT_2], etc.
  • Overlap handling: If a custom entity overlaps with an ML-detected entity, the custom placeholder takes priority.
  • Works across all endpoints: text, PDF, DOCX, CSV, Excel.
  • Multi-word phrases: full sentences are matched case-insensitively.

Limitations

  • Substring matching means "art" will match within "article", "party", "smart". Use specific phrases.
  • Maximum 50 custom entities per request.
  • Maximum 200 characters per phrase.

Next: Placeholder Format

Questa AI documentation.