Anonymization

Learn how to use Quba's Anonymization API to anonymize sensitive data in text using various techniques like replacement, masking, hashing, and encryption.

import {
  Configuration,
  SensitiveDataProtectionApi,
} from "@quba/sensitive-data-protection"

const text = `Patient John Smith (SSN: 123-45-6789) was admitted to
  Dubai General Hospital on 2024-01-15. Contact: john.smith@email.com`

async function protectData(text: string) {
  // Pass your API key on every request via the x-api-key header.
  const api = new SensitiveDataProtectionApi(
    new Configuration({ headers: { "x-api-key": "quba_..." } }),
  )

  return await api.anonymizeText({
    text,
    rules: [
      {
        type: "replace",
        entities: [{ type: "model", value: "person" }],
        replacement: "[PATIENT]",
      },
      {
        type: "mask",
        entities: [{ type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" }],
        masking_char: "*",
        chars_to_mask: 7,
        from_end: true,
      },
      {
        type: "redact",
        entities: [{ type: "model", value: "location" }],
      },
      {
        type: "sha256",
        entities: [{ type: "model", value: "email" }],
      },
    ],
  })
}

const response = await protectData(text)
console.log(response.text)
// Output: Patient [PATIENT] (SSN: ***-**-6789) was admitted to
// [REDACTED] on 2024-01-15. Contact: a1b2c3d4...

from quba_sdp import Client, models
from quba_sdp.api import anonymize_text

text = """Patient John Smith (SSN: 123-45-6789) was admitted to
  Dubai General Hospital on 2024-01-15. Contact: john.smith@email.com"""

# base_url defaults to the production API; pass your key via headers.
client = Client(headers={"x-api-key": "quba_..."})

response = anonymize_text.sync(
    client=client,
    body=models.AnonymizeRequestBody(
        text=text,
        rules=[
            models.ReplaceRule(
                entities=[models.ModelEntity(value="person")],
                replacement="[PATIENT]",
            ),
            models.MaskRule(
                entities=[models.RegexEntity(value=r"\d{3}-\d{2}-\d{4}")],
                masking_char="*",
                chars_to_mask=7,
                from_end=True,
            ),
            models.RedactRule(entities=[models.ModelEntity(value="location")]),
            models.SHA256Rule(entities=[models.ModelEntity(value="email")]),
        ],
    ),
)

print(response.text)
# Output: Patient [PATIENT] (SSN: ***-**-6789) was admitted to
# [REDACTED] on 2024-01-15. Contact: a1b2c3d4...

Use model when you want the AI to find entity types automatically. Use regex when you need to target a specific pattern like an ID format or a date structure.

// Model-based: AI finds the entity type
{ type: "model", value: "person" }

// Regex-based: matches your exact pattern
{ type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" }

# Model-based: AI finds the entity type
models.ModelEntity(value="person")

# Regex-based: matches your exact pattern
models.RegexEntity(value=r"\d{3}-\d{2}-\d{4}")

Rules

A rule tells the API what to detect and how to anonymize it. Every rule has two required parts:

type — the anonymization operation to apply (replace, redact, mask, sha256, sha512, encrypt)
entities — one or more targets to detect, each using either a model or a regex pattern

{
  type: "replace",                                    // what to do
  entities: [{ type: "model", value: "person" }],     // what to find
  replacement: "[PATIENT]"                            // anonymization option
}

You pass an array of rules. Each rule is evaluated independently against the full text — multiple rules can match and anonymize different spans, including overlapping ones. Rules are applied in order.

Entity Targets

Each entity in entities is either a model-based detection or a regex pattern:

Type	Description	Example
`model`	AI detects the entity type	`{ type: "model", value: "person" }`
`regex`	Your pattern matches the text	`{ type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" }`

A single rule can mix both:

{
  type: "redact",
  entities: [
    { type: "model", value: "email" },
    { type: "regex", value: "[A-Z]{2}\\d{6}" },  // custom ID format
  ]
}

models.RedactRule(
    entities=[
        models.ModelEntity(value="email"),
        models.RegexEntity(value=r"[A-Z]{2}\d{6}"),  # custom ID format
    ]
)

Confidence Threshold

Model-based detections include a score (0.0–1.0) representing the AI's confidence. The confidence_threshold parameter filters out low-confidence detections.

await api.anonymizeText({
  text: "...",
  rules: [...],
  confidence_threshold: 0.8, // only apply rules to high-confidence detections
})

anonymize_text.sync(
    client=client,
    body=models.AnonymizeRequestBody(
        text="...",
        rules=[...],
        confidence_threshold=0.8,  # only apply rules to high-confidence detections
    ),
)

Default: 0.5
Regex matches are always applied — threshold does not affect them
Set higher (e.g. 0.8) to reduce false positives; set lower to catch more entities

Anonymization Types

Replace

Replace detected entities with a custom string:

{
  type: "replace",
  entities: [{ type: "model", value: "person" }],
  replacement: "[REDACTED]"   // default: "****"
}

models.ReplaceRule(
    entities=[models.ModelEntity(value="person")],
    replacement="[REDACTED]",  # default: "****"
)

Redact

Remove detected entities entirely:

{
  type: "redact",
  entities: [{ type: "model", value: "email" }]
}

models.RedactRule(entities=[models.ModelEntity(value="email")])

Mask

Mask part of the detected entity:

{
  type: "mask",
  entities: [{ type: "model", value: "phone" }],
  masking_char: "*",     // default: "*"
  chars_to_mask: 8,      // null = mask all characters
  from_end: true         // false = mask from start
}
// +1-555-123-4567 → +1-555-***-****

models.MaskRule(
    entities=[models.ModelEntity(value="phone")],
    masking_char="*",      # default: "*"
    chars_to_mask=8,       # None = mask all characters
    from_end=True,         # False = mask from start
)
# +1-555-123-4567 → +1-555-***-****

SHA256 / SHA512

Hash detected entities:

{
  type: "sha256",
  entities: [{ type: "model", value: "id" }]
}

models.SHA256Rule(entities=[models.ModelEntity(value="id")])
# or models.SHA512Rule(...)

Encrypt

Encrypt detected entities with a key:

{
  type: "encrypt",
  entities: [{ type: "model", value: "person" }],
  key: "your-encryption-key"   // default: ""
}

models.EncryptRule(
    entities=[models.ModelEntity(value="person")],
    key="your-encryption-key",  # default: ""
)

Regex Patterns

You can also target text using regex patterns:

{
  type: "replace",
  entities: [{ type: "regex", value: "\\d{4}-\\d{2}-\\d{2}" }],
  replacement: "[DATE]"
}

models.ReplaceRule(
    entities=[models.RegexEntity(value=r"\d{4}-\d{2}-\d{2}")],
    replacement="[DATE]",
)

Rule Application Results

When anonymization runs, every entity that was detected and anonymized produces a result record. These records are returned in response.results as an ordered audit trail — one entry per matched span.

const response = await api.anonymizeText({ text, rules })

console.log(response.text) // the anonymized string
console.log(response.results) // one record per anonymization applied

response = anonymize_text.sync(
    client=client,
    body=models.AnonymizeRequestBody(text=text, rules=rules),
)

print(response.text)     # the anonymized string
print(response.results)  # one record per anonymization applied

Each result tells you:

which rule was applied (rule)
what was detected (value — entity type or regex pattern)
where it was in the original text (input)
where it ended up in the anonymized text (output)

A result is either a model result (AI detection) or a regex result (pattern match), identified by type:

Model Result

{
  type: "model",
  rule: "replace",            // rule type that was applied
  value: "PERSON",            // entity type the model detected
  score: 0.94,                // AI confidence (0.0–1.0)
  input:  { start: 8, end: 18, value: "John Smith" },   // position in original text
  output: { start: 8, end: 17, value: "[PATIENT]"  },   // position in result text
}

Regex Result

{
  type: "regex",
  rule: "mask",                                          // rule type that was applied
  value: "\\d{3}-\\d{2}-\\d{4}",                        // pattern that matched
  input:  { start: 20, end: 31, value: "123-45-6789" }, // position in original text
  output: { start: 20, end: 31, value: "***-**-6789" }, // position in result text
}

TextRange Fields

Field	Type	Description
`start`	number	Start character offset (inclusive)
`end`	number	End character offset (exclusive)
`value`	string	The substring at `[start, end)`

Results are ordered by position in the output text. Use input offsets to map back to the original text, or output offsets to highlight spans in the anonymized result.

Anonymization

On this page