توثيق قبة

Anonymization

Learn how to use Quba's Anonymization API to transform sensitive data in text using various techniques like replacement, masking, hashing, and encryption.

import { SensitiveDataProtectionApi } from "@quba/sensitive-data-protection"


const text = `Patient John Smith (SSN: 123-45-6789) was admitted to
  Dubai General Hospital on 2024-01-15. Contact: john.smith@email.com`

async function protectData(text: string) {
  const api = new SensitiveDataProtectionApi()

  return await api.anonymizeText({
    text,
    rules: [
      {
        type: "replace",
        entities: [{ type: "model", value: "person" }],
        replacement: "[PATIENT]",
      },
      {
        type: "mask",
        entities: [{ type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" }],
        masking_char: "*",
        chars_to_mask: 7,
        from_end: true,
      },
      {
        type: "redact",
        entities: [{ type: "model", value: "location" }],
      },
      {
        type: "sha256",
        entities: [{ type: "model", value: "email" }],
      },
    ],
  })
}

const response = await protectData(text)
console.log(response.text)
  // Output: Patient [PATIENT] (SSN: ***-**-6789) was admitted to
  // [REDACTED] on 2024-01-15. Contact: a1b2c3d4...

Rules

A rule tells the API what to detect and how to transform it. Every rule has two required parts:

  • type — the transformation to apply (replace, redact, mask, sha256, sha512, encrypt)
  • entities — one or more targets to detect, each using either a model or a regex pattern
{
  type: "replace",                                    // what to do
  entities: [{ type: "model", value: "person" }],     // what to find
  replacement: "[PATIENT]"                            // transformation option
}

You pass an array of rules. Each rule is evaluated independently against the full text — multiple rules can match and transform different spans, including overlapping ones. Rules are applied in order.

Entity Targets

Each entity in entities is either a model-based detection or a regex pattern:

TypeDescriptionExample
modelAI detects the entity type{ type: "model", value: "person" }
regexYour pattern matches the text{ type: "regex", value: "\\d{3}-\\d{2}-\\d{4}" }

A single rule can mix both:

{
  type: "redact",
  entities: [
    { type: "model", value: "email" },
    { type: "regex", value: "[A-Z]{2}\\d{6}" },  // custom ID format
  ]
}

Transformation Types

Replace

Replace detected entities with a custom string:

{
  type: "replace",
  entities: [{ type: "model", value: "person" }],
  replacement: "[REDACTED]"   // default: "****"
}

Redact

Remove detected entities entirely:

{
  type: "redact",
  entities: [{ type: "model", value: "email" }]
}

Mask

Mask part of the detected entity:

{
  type: "mask",
  entities: [{ type: "model", value: "phone" }],
  masking_char: "*",     // default: "*"
  chars_to_mask: 8,      // null = mask all characters
  from_end: true         // false = mask from start
}
// +1-555-123-4567 → +1-555-***-****

SHA256 / SHA512

Hash detected entities:

{
  type: "sha256",
  entities: [{ type: "model", value: "id" }]
}

Encrypt

Encrypt detected entities with a key:

{
  type: "encrypt",
  entities: [{ type: "model", value: "person" }],
  key: "your-encryption-key"   // default: ""
}

Regex Patterns

You can also target text using regex patterns:

{
  type: "replace",
  entities: [{ type: "regex", value: "\\d{4}-\\d{2}-\\d{2}" }],
  replacement: "[DATE]"
}

Rule Application Results

When anonymization runs, every entity that was detected and transformed produces a result record. These records are returned in response.results as an ordered audit trail — one entry per matched span.

const response = await api.anonymizeText({ text, rules })

console.log(response.text)     // the anonymized string
console.log(response.results)  // one record per transformation applied

Each result tells you:

  • which rule was applied (rule)
  • what was detected (value — entity type or regex pattern)
  • where it was in the original text (input)
  • where it ended up in the anonymized text (output)

A result is either a model result (AI detection) or a regex result (pattern match), identified by type:

Model Result

{
  type: "model",
  rule: "replace",            // rule type that was applied
  value: "PERSON",            // entity type the model detected
  score: 0.94,                // AI confidence (0.0–1.0)
  input:  { start: 8, end: 18, value: "John Smith" },   // position in original text
  output: { start: 8, end: 17, value: "[PATIENT]"  },   // position in result text
}

Regex Result

{
  type: "regex",
  rule: "mask",                                          // rule type that was applied
  value: "\\d{3}-\\d{2}-\\d{4}",                        // pattern that matched
  input:  { start: 20, end: 31, value: "123-45-6789" }, // position in original text
  output: { start: 20, end: 31, value: "***-**-6789" }, // position in result text
}

TextRange Fields

FieldTypeDescription
startnumberStart character offset (inclusive)
endnumberEnd character offset (exclusive)
valuestringThe substring at [start, end)

Results are ordered by position in the output text. Use input offsets to map back to the original text, or output offsets to highlight spans in the anonymized result.