Scanning Text

Learn how to use Quba's Scan API to detect sensitive entities in text with confidence scores.

Overview

The Scan API detects sensitive entities in text and returns their positions, types, and confidence scores. Use this to identify personal, confidential, or regulated information before deciding how to handle it.

const response = await api.scanText({
  text: "Patient John Smith (ID: 12345) was treated at Dubai Hospital",
  language: "en",
  entities: ["person", "id", "location"],
  confidence_threshold: 0.5,
})

response = scan_text.sync(
    client=client,
    body=models.ScanRequestBody(
        text="Patient John Smith (ID: 12345) was treated at Dubai Hospital",
        language="en",
        entities=["person", "id", "location"],
        confidence_threshold=0.5,
    ),
)

Parameters

Parameter	Type	Default	Description
`text`	string	—	Input text to scan
`language`	string	`"en"`	Language code
`entities`	string[]	`["id","name","email","location"]`	Entity types to detect
`confidence_threshold`	number	—	Minimum confidence score (0.0–1.0)

Scan Results

A scan result represents a detection — an entity the model found in the text. It has no rule applied; it only tells you what was found and where. This is distinct from a rule application result, which records what was transformed after a rule ran.

Use scan results to:

Audit what sensitive data exists in a text before processing it
Decide which rules to apply in a subsequent anonymize call

Response

Each result in response.results represents one detected entity:

{
  start: 8,          // character offset (inclusive)
  end: 18,           // character offset (exclusive)
  score: 0.92,       // AI confidence (0.0–1.0)
  entity_type: "person"
}

Field	Type	Description
`start`	number	Start character offset in the input text
`end`	number	End character offset (exclusive)
`score`	number	Confidence score from the AI model
`entity_type`	string	The type of entity detected (e.g. `"person"`)

Use text.slice(result.start, result.end) to extract the matched substring.

Example

const text = "Contact John Doe at john.doe@example.com"

const response = await api.scanText({
  text,
  entities: ["person", "email"],
})

for (const result of response.results) {
  const matched = text.slice(result.start, result.end)
  console.log(`${result.entity_type}: "${matched}" (score: ${result.score})`)
}
// person: "John Doe" (score: 0.94)
// email:  "john.doe@example.com" (score: 0.99)

text = "Contact John Doe at john.doe@example.com"

response = scan_text.sync(
    client=client,
    body=models.ScanRequestBody(text=text, entities=["person", "email"]),
)

for result in response.results:
    matched = text[result.start : result.end]
    print(f'{result.entity_type}: "{matched}" (score: {result.score})')
# person: "John Doe" (score: 0.94)
# email:  "john.doe@example.com" (score: 0.99)