Anonymization
Learn how to use Quba's Anonymization API to transform sensitive data in text using various techniques like replacement, masking, hashing, and encryption.
import { SensitiveDataProtectionApi } from "@quba/sensitive-data-protection"
const text = `Patient John Smith (SSN: 123-45-6789) was admitted to
Dubai General Hospital on 2024-01-15. Contact: john.smith@email.com`
async function protectData(text: string) {
const api = new SensitiveDataProtectionApi()
return await api.anonymizeText({
text,
rules: [
{
type: "replace",
entities: [{ type: "model", value: "person" }],
replacement: "[PATIENT]",
},
{
type: "mask",
entities: [{ type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" }],
masking_char: "*",
chars_to_mask: 7,
from_end: true,
},
{
type: "redact",
entities: [{ type: "model", value: "location" }],
},
{
type: "sha256",
entities: [{ type: "model", value: "email" }],
},
],
})
}
const response = await protectData(text)
console.log(response.text)
// Output: Patient [PATIENT] (SSN: ***-**-6789) was admitted to
// [REDACTED] on 2024-01-15. Contact: a1b2c3d4...Rules
A rule tells the API what to detect and how to transform it. Every rule has two required parts:
type— the transformation to apply (replace,redact,mask,sha256,sha512,encrypt)entities— one or more targets to detect, each using either a model or a regex pattern
{
type: "replace", // what to do
entities: [{ type: "model", value: "person" }], // what to find
replacement: "[PATIENT]" // transformation option
}You pass an array of rules. Each rule is evaluated independently against the full text — multiple rules can match and transform different spans, including overlapping ones. Rules are applied in order.
Entity Targets
Each entity in entities is either a model-based detection or a regex pattern:
| Type | Description | Example |
|---|---|---|
model | AI detects the entity type | { type: "model", value: "person" } |
regex | Your pattern matches the text | { type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" } |
A single rule can mix both:
{
type: "redact",
entities: [
{ type: "model", value: "email" },
{ type: "regex", value: "[A-Z]{2}\\d{6}" }, // custom ID format
]
}Transformation Types
Replace
Replace detected entities with a custom string:
{
type: "replace",
entities: [{ type: "model", value: "person" }],
replacement: "[REDACTED]" // default: "****"
}Redact
Remove detected entities entirely:
{
type: "redact",
entities: [{ type: "model", value: "email" }]
}Mask
Mask part of the detected entity:
{
type: "mask",
entities: [{ type: "model", value: "phone" }],
masking_char: "*", // default: "*"
chars_to_mask: 8, // null = mask all characters
from_end: true // false = mask from start
}
// +1-555-123-4567 → +1-555-***-****SHA256 / SHA512
Hash detected entities:
{
type: "sha256",
entities: [{ type: "model", value: "id" }]
}Encrypt
Encrypt detected entities with a key:
{
type: "encrypt",
entities: [{ type: "model", value: "person" }],
key: "your-encryption-key" // default: ""
}Regex Patterns
You can also target text using regex patterns:
{
type: "replace",
entities: [{ type: "regex", value: "\\d{4}-\\d{2}-\\d{2}" }],
replacement: "[DATE]"
}Rule Application Results
When anonymization runs, every entity that was detected and transformed produces a result record. These records are returned in response.results as an ordered audit trail — one entry per matched span.
const response = await api.anonymizeText({ text, rules })
console.log(response.text) // the anonymized string
console.log(response.results) // one record per transformation appliedEach result tells you:
- which rule was applied (
rule) - what was detected (
value— entity type or regex pattern) - where it was in the original text (
input) - where it ended up in the anonymized text (
output)
A result is either a model result (AI detection) or a regex result (pattern match), identified by type:
Model Result
{
type: "model",
rule: "replace", // rule type that was applied
value: "PERSON", // entity type the model detected
score: 0.94, // AI confidence (0.0–1.0)
input: { start: 8, end: 18, value: "John Smith" }, // position in original text
output: { start: 8, end: 17, value: "[PATIENT]" }, // position in result text
}Regex Result
{
type: "regex",
rule: "mask", // rule type that was applied
value: "\\d{3}-\\d{2}-\\d{4}", // pattern that matched
input: { start: 20, end: 31, value: "123-45-6789" }, // position in original text
output: { start: 20, end: 31, value: "***-**-6789" }, // position in result text
}TextRange Fields
| Field | Type | Description |
|---|---|---|
start | number | Start character offset (inclusive) |
end | number | End character offset (exclusive) |
value | string | The substring at [start, end) |
Results are ordered by position in the output text. Use input offsets to map back to the original text, or output offsets to highlight spans in the anonymized result.