Anonymization
Learn how to use Quba's Anonymization API to anonymize sensitive data in text using various techniques like replacement, masking, hashing, and encryption.
import {
Configuration,
SensitiveDataProtectionApi,
} from "@quba/sensitive-data-protection"
const text = `Patient John Smith (SSN: 123-45-6789) was admitted to
Dubai General Hospital on 2024-01-15. Contact: john.smith@email.com`
async function protectData(text: string) {
// Pass your API key on every request via the x-api-key header.
const api = new SensitiveDataProtectionApi(
new Configuration({ headers: { "x-api-key": "quba_..." } }),
)
return await api.anonymizeText({
text,
rules: [
{
type: "replace",
entities: [{ type: "model", value: "person" }],
replacement: "[PATIENT]",
},
{
type: "mask",
entities: [{ type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" }],
masking_char: "*",
chars_to_mask: 7,
from_end: true,
},
{
type: "redact",
entities: [{ type: "model", value: "location" }],
},
{
type: "sha256",
entities: [{ type: "model", value: "email" }],
},
],
})
}
const response = await protectData(text)
console.log(response.text)
// Output: Patient [PATIENT] (SSN: ***-**-6789) was admitted to
// [REDACTED] on 2024-01-15. Contact: a1b2c3d4...from quba_sdp import Client, models
from quba_sdp.api import anonymize_text
text = """Patient John Smith (SSN: 123-45-6789) was admitted to
Dubai General Hospital on 2024-01-15. Contact: john.smith@email.com"""
# base_url defaults to the production API; pass your key via headers.
client = Client(headers={"x-api-key": "quba_..."})
response = anonymize_text.sync(
client=client,
body=models.AnonymizeRequestBody(
text=text,
rules=[
models.ReplaceRule(
entities=[models.ModelEntity(value="person")],
replacement="[PATIENT]",
),
models.MaskRule(
entities=[models.RegexEntity(value=r"\d{3}-\d{2}-\d{4}")],
masking_char="*",
chars_to_mask=7,
from_end=True,
),
models.RedactRule(entities=[models.ModelEntity(value="location")]),
models.SHA256Rule(entities=[models.ModelEntity(value="email")]),
],
),
)
print(response.text)
# Output: Patient [PATIENT] (SSN: ***-**-6789) was admitted to
# [REDACTED] on 2024-01-15. Contact: a1b2c3d4...Use model when you want the AI to find entity types automatically. Use regex when you need to target a specific pattern like an ID format or a date structure.
// Model-based: AI finds the entity type
{ type: "model", value: "person" }
// Regex-based: matches your exact pattern
{ type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" }# Model-based: AI finds the entity type
models.ModelEntity(value="person")
# Regex-based: matches your exact pattern
models.RegexEntity(value=r"\d{3}-\d{2}-\d{4}")Rules
A rule tells the API what to detect and how to anonymize it. Every rule has two required parts:
type— the anonymization operation to apply (replace,redact,mask,sha256,sha512,encrypt)entities— one or more targets to detect, each using either a model or a regex pattern
{
type: "replace", // what to do
entities: [{ type: "model", value: "person" }], // what to find
replacement: "[PATIENT]" // anonymization option
}You pass an array of rules. Each rule is evaluated independently against the full text — multiple rules can match and anonymize different spans, including overlapping ones. Rules are applied in order.
Entity Targets
Each entity in entities is either a model-based detection or a regex pattern:
| Type | Description | Example |
|---|---|---|
model | AI detects the entity type | { type: "model", value: "person" } |
regex | Your pattern matches the text | { type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" } |
A single rule can mix both:
{
type: "redact",
entities: [
{ type: "model", value: "email" },
{ type: "regex", value: "[A-Z]{2}\\d{6}" }, // custom ID format
]
}models.RedactRule(
entities=[
models.ModelEntity(value="email"),
models.RegexEntity(value=r"[A-Z]{2}\d{6}"), # custom ID format
]
)Confidence Threshold
Model-based detections include a score (0.0–1.0) representing the AI's confidence. The confidence_threshold parameter filters out low-confidence detections.
await api.anonymizeText({
text: "...",
rules: [...],
confidence_threshold: 0.8, // only apply rules to high-confidence detections
})anonymize_text.sync(
client=client,
body=models.AnonymizeRequestBody(
text="...",
rules=[...],
confidence_threshold=0.8, # only apply rules to high-confidence detections
),
)- Default:
0.5 - Regex matches are always applied — threshold does not affect them
- Set higher (e.g.
0.8) to reduce false positives; set lower to catch more entities
Anonymization Types
Replace
Replace detected entities with a custom string:
{
type: "replace",
entities: [{ type: "model", value: "person" }],
replacement: "[REDACTED]" // default: "****"
}models.ReplaceRule(
entities=[models.ModelEntity(value="person")],
replacement="[REDACTED]", # default: "****"
)Redact
Remove detected entities entirely:
{
type: "redact",
entities: [{ type: "model", value: "email" }]
}models.RedactRule(entities=[models.ModelEntity(value="email")])Mask
Mask part of the detected entity:
{
type: "mask",
entities: [{ type: "model", value: "phone" }],
masking_char: "*", // default: "*"
chars_to_mask: 8, // null = mask all characters
from_end: true // false = mask from start
}
// +1-555-123-4567 → +1-555-***-****models.MaskRule(
entities=[models.ModelEntity(value="phone")],
masking_char="*", # default: "*"
chars_to_mask=8, # None = mask all characters
from_end=True, # False = mask from start
)
# +1-555-123-4567 → +1-555-***-****SHA256 / SHA512
Hash detected entities:
{
type: "sha256",
entities: [{ type: "model", value: "id" }]
}models.SHA256Rule(entities=[models.ModelEntity(value="id")])
# or models.SHA512Rule(...)Encrypt
Encrypt detected entities with a key:
{
type: "encrypt",
entities: [{ type: "model", value: "person" }],
key: "your-encryption-key" // default: ""
}models.EncryptRule(
entities=[models.ModelEntity(value="person")],
key="your-encryption-key", # default: ""
)Regex Patterns
You can also target text using regex patterns:
{
type: "replace",
entities: [{ type: "regex", value: "\\d{4}-\\d{2}-\\d{2}" }],
replacement: "[DATE]"
}models.ReplaceRule(
entities=[models.RegexEntity(value=r"\d{4}-\d{2}-\d{2}")],
replacement="[DATE]",
)Rule Application Results
When anonymization runs, every entity that was detected and anonymized produces a result record. These records are returned in response.results as an ordered audit trail — one entry per matched span.
const response = await api.anonymizeText({ text, rules })
console.log(response.text) // the anonymized string
console.log(response.results) // one record per anonymization appliedresponse = anonymize_text.sync(
client=client,
body=models.AnonymizeRequestBody(text=text, rules=rules),
)
print(response.text) # the anonymized string
print(response.results) # one record per anonymization appliedEach result tells you:
- which rule was applied (
rule) - what was detected (
value— entity type or regex pattern) - where it was in the original text (
input) - where it ended up in the anonymized text (
output)
A result is either a model result (AI detection) or a regex result (pattern match), identified by type:
Model Result
{
type: "model",
rule: "replace", // rule type that was applied
value: "PERSON", // entity type the model detected
score: 0.94, // AI confidence (0.0–1.0)
input: { start: 8, end: 18, value: "John Smith" }, // position in original text
output: { start: 8, end: 17, value: "[PATIENT]" }, // position in result text
}Regex Result
{
type: "regex",
rule: "mask", // rule type that was applied
value: "\\d{3}-\\d{2}-\\d{4}", // pattern that matched
input: { start: 20, end: 31, value: "123-45-6789" }, // position in original text
output: { start: 20, end: 31, value: "***-**-6789" }, // position in result text
}TextRange Fields
| Field | Type | Description |
|---|---|---|
start | number | Start character offset (inclusive) |
end | number | End character offset (exclusive) |
value | string | The substring at [start, end) |
Results are ordered by position in the output text. Use input offsets to map back to the original text, or output offsets to highlight spans in the anonymized result.