Quba

Anonymization

Learn how to use Quba's Anonymization API to anonymize sensitive data in text using various techniques like replacement, masking, hashing, and encryption.

import {
  Configuration,
  SensitiveDataProtectionApi,
} from "@quba/sensitive-data-protection"

const text = `Patient John Smith (SSN: 123-45-6789) was admitted to
  Dubai General Hospital on 2024-01-15. Contact: john.smith@email.com`

async function protectData(text: string) {
  // Pass your API key on every request via the x-api-key header.
  const api = new SensitiveDataProtectionApi(
    new Configuration({ headers: { "x-api-key": "quba_..." } }),
  )

  return await api.anonymizeText({
    text,
    rules: [
      {
        type: "replace",
        entities: [{ type: "model", value: "person" }],
        replacement: "[PATIENT]",
      },
      {
        type: "mask",
        entities: [{ type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" }],
        masking_char: "*",
        chars_to_mask: 7,
        from_end: true,
      },
      {
        type: "redact",
        entities: [{ type: "model", value: "location" }],
      },
      {
        type: "sha256",
        entities: [{ type: "model", value: "email" }],
      },
    ],
  })
}

const response = await protectData(text)
console.log(response.text)
// Output: Patient [PATIENT] (SSN: ***-**-6789) was admitted to
// [REDACTED] on 2024-01-15. Contact: a1b2c3d4...
from quba_sdp import Client, models
from quba_sdp.api import anonymize_text

text = """Patient John Smith (SSN: 123-45-6789) was admitted to
  Dubai General Hospital on 2024-01-15. Contact: john.smith@email.com"""

# base_url defaults to the production API; pass your key via headers.
client = Client(headers={"x-api-key": "quba_..."})

response = anonymize_text.sync(
    client=client,
    body=models.AnonymizeRequestBody(
        text=text,
        rules=[
            models.ReplaceRule(
                entities=[models.ModelEntity(value="person")],
                replacement="[PATIENT]",
            ),
            models.MaskRule(
                entities=[models.RegexEntity(value=r"\d{3}-\d{2}-\d{4}")],
                masking_char="*",
                chars_to_mask=7,
                from_end=True,
            ),
            models.RedactRule(entities=[models.ModelEntity(value="location")]),
            models.SHA256Rule(entities=[models.ModelEntity(value="email")]),
        ],
    ),
)

print(response.text)
# Output: Patient [PATIENT] (SSN: ***-**-6789) was admitted to
# [REDACTED] on 2024-01-15. Contact: a1b2c3d4...

Use model when you want the AI to find entity types automatically. Use regex when you need to target a specific pattern like an ID format or a date structure.

// Model-based: AI finds the entity type
{ type: "model", value: "person" }

// Regex-based: matches your exact pattern
{ type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" }
# Model-based: AI finds the entity type
models.ModelEntity(value="person")

# Regex-based: matches your exact pattern
models.RegexEntity(value=r"\d{3}-\d{2}-\d{4}")

Rules

A rule tells the API what to detect and how to anonymize it. Every rule has two required parts:

  • type — the anonymization operation to apply (replace, redact, mask, sha256, sha512, encrypt)
  • entities — one or more targets to detect, each using either a model or a regex pattern
{
  type: "replace",                                    // what to do
  entities: [{ type: "model", value: "person" }],     // what to find
  replacement: "[PATIENT]"                            // anonymization option
}

You pass an array of rules. Each rule is evaluated independently against the full text — multiple rules can match and anonymize different spans, including overlapping ones. Rules are applied in order.

Entity Targets

Each entity in entities is either a model-based detection or a regex pattern:

TypeDescriptionExample
modelAI detects the entity type{ type: "model", value: "person" }
regexYour pattern matches the text{ type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" }

A single rule can mix both:

{
  type: "redact",
  entities: [
    { type: "model", value: "email" },
    { type: "regex", value: "[A-Z]{2}\\d{6}" },  // custom ID format
  ]
}
models.RedactRule(
    entities=[
        models.ModelEntity(value="email"),
        models.RegexEntity(value=r"[A-Z]{2}\d{6}"),  # custom ID format
    ]
)

Confidence Threshold

Model-based detections include a score (0.0–1.0) representing the AI's confidence. The confidence_threshold parameter filters out low-confidence detections.

await api.anonymizeText({
  text: "...",
  rules: [...],
  confidence_threshold: 0.8, // only apply rules to high-confidence detections
})
anonymize_text.sync(
    client=client,
    body=models.AnonymizeRequestBody(
        text="...",
        rules=[...],
        confidence_threshold=0.8,  # only apply rules to high-confidence detections
    ),
)
  • Default: 0.5
  • Regex matches are always applied — threshold does not affect them
  • Set higher (e.g. 0.8) to reduce false positives; set lower to catch more entities

Anonymization Types

Replace

Replace detected entities with a custom string:

{
  type: "replace",
  entities: [{ type: "model", value: "person" }],
  replacement: "[REDACTED]"   // default: "****"
}
models.ReplaceRule(
    entities=[models.ModelEntity(value="person")],
    replacement="[REDACTED]",  # default: "****"
)

Redact

Remove detected entities entirely:

{
  type: "redact",
  entities: [{ type: "model", value: "email" }]
}
models.RedactRule(entities=[models.ModelEntity(value="email")])

Mask

Mask part of the detected entity:

{
  type: "mask",
  entities: [{ type: "model", value: "phone" }],
  masking_char: "*",     // default: "*"
  chars_to_mask: 8,      // null = mask all characters
  from_end: true         // false = mask from start
}
// +1-555-123-4567 → +1-555-***-****
models.MaskRule(
    entities=[models.ModelEntity(value="phone")],
    masking_char="*",      # default: "*"
    chars_to_mask=8,       # None = mask all characters
    from_end=True,         # False = mask from start
)
# +1-555-123-4567 → +1-555-***-****

SHA256 / SHA512

Hash detected entities:

{
  type: "sha256",
  entities: [{ type: "model", value: "id" }]
}
models.SHA256Rule(entities=[models.ModelEntity(value="id")])
# or models.SHA512Rule(...)

Encrypt

Encrypt detected entities with a key:

{
  type: "encrypt",
  entities: [{ type: "model", value: "person" }],
  key: "your-encryption-key"   // default: ""
}
models.EncryptRule(
    entities=[models.ModelEntity(value="person")],
    key="your-encryption-key",  # default: ""
)

Regex Patterns

You can also target text using regex patterns:

{
  type: "replace",
  entities: [{ type: "regex", value: "\\d{4}-\\d{2}-\\d{2}" }],
  replacement: "[DATE]"
}
models.ReplaceRule(
    entities=[models.RegexEntity(value=r"\d{4}-\d{2}-\d{2}")],
    replacement="[DATE]",
)

Rule Application Results

When anonymization runs, every entity that was detected and anonymized produces a result record. These records are returned in response.results as an ordered audit trail — one entry per matched span.

const response = await api.anonymizeText({ text, rules })

console.log(response.text) // the anonymized string
console.log(response.results) // one record per anonymization applied
response = anonymize_text.sync(
    client=client,
    body=models.AnonymizeRequestBody(text=text, rules=rules),
)

print(response.text)     # the anonymized string
print(response.results)  # one record per anonymization applied

Each result tells you:

  • which rule was applied (rule)
  • what was detected (value — entity type or regex pattern)
  • where it was in the original text (input)
  • where it ended up in the anonymized text (output)

A result is either a model result (AI detection) or a regex result (pattern match), identified by type:

Model Result

{
  type: "model",
  rule: "replace",            // rule type that was applied
  value: "PERSON",            // entity type the model detected
  score: 0.94,                // AI confidence (0.0–1.0)
  input:  { start: 8, end: 18, value: "John Smith" },   // position in original text
  output: { start: 8, end: 17, value: "[PATIENT]"  },   // position in result text
}

Regex Result

{
  type: "regex",
  rule: "mask",                                          // rule type that was applied
  value: "\\d{3}-\\d{2}-\\d{4}",                        // pattern that matched
  input:  { start: 20, end: 31, value: "123-45-6789" }, // position in original text
  output: { start: 20, end: 31, value: "***-**-6789" }, // position in result text
}

TextRange Fields

FieldTypeDescription
startnumberStart character offset (inclusive)
endnumberEnd character offset (exclusive)
valuestringThe substring at [start, end)

Results are ordered by position in the output text. Use input offsets to map back to the original text, or output offsets to highlight spans in the anonymized result.

© 2026 Quba. All rights reserved.