Skip to content

Best Practices

Recommendations for integrating Questa Anonymizer in production.

Production runs self-hosted. These integration patterns (retries, high-volume chunking, secure map storage) apply to a production self-hosted instance. The hosted demo (demo.questa-ai.online) is for evaluation only. See Hosted vs Self-Hosted.

1. Filter Entity Types

Running with all entity types enabled increases latency and false positives. Scope detection to the entities relevant to your use case.

python
# Scope to relevant entity types only
payload = {
    "text": text,
    "entities": "PERSON_NAME,EMAIL_ADDRESS,PHONE_NUMBER"
}
Use CaseRecommended Entities
Customer support logsPERSON_NAME,EMAIL_ADDRESS,PHONE_NUMBER,ADDRESS
Financial documentsCREDIT_CARD,IBAN,VAT_NUMBER,NATIONAL_ID
Medical recordsPERSON_NAME,DATE,ADDRESS,NATIONAL_ID
HR documentsPERSON_NAME,EMAIL_ADDRESS,PHONE_NUMBER,ADDRESS,NATIONAL_ID
Source code / logsEMAIL_ADDRESS,API_KEY,LICENSE_KEY,USERNAME

2. Handle Errors Gracefully

python
import time

def anonymize_with_retry(client, payload, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.post("/anonymize/text", json=payload)

            if response.status_code == 200:
                return response.json()

            if response.status_code == 401:
                raise PermissionError("Bearer token is missing or invalid")

            if response.status_code == 429:
                retry_after = int(response.headers.get("Retry-After", "5"))
                time.sleep(retry_after)
                continue

            if response.status_code >= 500:
                time.sleep(2 ** attempt)
                continue

            response.raise_for_status()

        except requests.exceptions.RequestException as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)

3. Use Start Index for Chunked Processing

When processing text in chunks, pass start_index to ensure placeholder numbering is globally unique.

python
offset = 0
for chunk in split_into_chunks(large_text, chunk_size=10000):
    result = anonymize_text(chunk, start_index=offset)
    offset += result["entities"]

4. Store De-anonymisation Maps Securely

If you need to reverse anonymization, store the map in an encrypted database. Never expose the map to untrusted users.

python
import json
from cryptography.fernet import Fernet

key = Fernet.generate_key()
cipher = Fernet(key)

encrypted = cipher.encrypt(json.dumps(result["map"]).encode())

5. Implement Exponential Backoff

python
import time
from functools import wraps

def retry_with_backoff(max_retries=5, base_delay=1.0):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise
                    delay = base_delay * (2 ** attempt)
                    time.sleep(delay)
        return wrapper
    return decorator

6. Security

  • Always send requests over HTTPS.
  • Validate file types before uploading (check MIME type server-side).
  • Set file size limits at your reverse proxy (recommended: 100 MB max).
  • Never hardcode your license key (self-hosted) or evaluation key (hosted demo) in source code. Use environment variables or a secrets manager.

Next: Entity Types

Questa AI documentation.