Faasera Profiler supports advanced techniques to detect, classify, and protect sensitive data in unstructured text. You can configure the Profiler to use AI-driven entity hinting, heuristic recognition, or a hybrid of both approaches.
This guide explains the available hint processing modes, their configuration, and how Faasera blends multiple techniques like dictionary matching, NLP models, and checksum validation to enrich entity recognition.
Faasera supports the following modes under the aiModel.hintProcessingMode setting:
| Mode | Description |
|---|---|
AI_ONLY |
Use only AI hint service results. Disables all heuristic detection. |
HEURISTIC_ONLY |
Use only heuristic recognizers (dictionary, NLP, checksum, etc.). |
HYBRID (default) |
Combine both AI and heuristic results, merging them with disambiguation. |
When enabled, Faasera sends selected text values to an external Hinting Service (e.g., GLiNER) via HTTP POST. The service returns structured entity spans:
[
{
"start": 0,
"end": 5,
"text": "John",
"label": "first_name",
"score": 0.7638
}
]
These AI-detected spans are converted into internal Span objects with configurable mapping to RecognizerType via:
"classificationToHintService": {
"FIRST_NAME": "first_name",
"EMAIL_ADDRESS": "email",
"SSN": "ssn"
}
The confidence value is preserved (rounded to 2 decimals) for sorting and disambiguation.
Heuristic-based recognition is a rules-based approach for identifying sensitive entities in text. Unlike AI models that rely on statistical learning, heuristics use predefined logic, making them fast, transparent, and * *highly customizable**. This approach is ideal for regulated environments that require deterministic behavior and auditability.
Faasera Profiler implements several heuristic techniques:
Faasera leverages pre-trained NLP models to detect structured entities in free-form text.
| Entity Type | Example Text | Detected Entity |
|---|---|---|
| Person Name | “My name is Alice Carter” | Alice Carter |
| Location | “I live in New York” | New York |
These models analyze linguistic patterns (tokens, POS tags) to infer the type of entity.
Faasera supports custom term dictionaries that define known sensitive keywords or entities.
Example Dictionary:
{
"COUNTRY_NAMES": [
"Australia",
"Canada",
"Japan"
],
"HEALTH_TERMS": [
"asthma",
"diabetes",
"HIV"
]
}
Example Input:
"The patient was diagnosed with diabetes in Australia."
Output:
diabetes → Health TermAustralia → Country NameDictionaries are ideal for domain-specific use cases like healthcare, finance, or law.
You can define keywords or token triggers that imply sensitivity without requiring an entity match.
Examples:
"credit card" and a number → flag the number."SSN" is present → treat the next 9-digit pattern as sensitive.This rule-based method helps catch context-sensitive values.
Faasera supports structural validation using checksums and patterns. These techniques go beyond detection — they * *verify** the integrity of the format.
| Type | Example Input | Detection Logic |
|---|---|---|
| Credit Card | 4111 1111 1111 1111 |
Luhn Algorithm (valid) |
| National ID | 987654321 |
Pattern + checksum |
| Tax Numbers | TFN 123 456 782 |
Modulus-based checksum |
This ensures that detected entities are valid, not just pattern-matched.
Advanced users may define custom regular expressions to capture specific formats.
Example Regex:
[A-Z]{3}-\d{4}
This pattern matches strings like ABC-1234. Useful for internal identifiers or industry-specific formats.
| Benefit | Description |
|---|---|
| Speed | No external API calls or models required |
| Predictability | Deterministic logic — same input yields same result |
| Auditability | Easy to explain and document logic for compliance |
| ️Customizability | Users can create dictionaries, keywords, or regex rules |
| Fallback Ready | Works well as a baseline or fallback to AI-based recognition |
Heuristics form the foundation of Faasera’s detection engine and can be used independently or combined with AI hinting to balance precision and performance across varied data environments.
┌───────────────────────────────┐
│ Raw Text Input from User │
└─────────────┬─────────────────┘
│
┌─────────▼─────────────┐
│ Faasera Profiler │
│ (Unstructured Mode) │
└─────────┬─────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌───────▼───────┐ ┌────────▼─────────┐ ┌──────▼────────┐
│ AI Hint Mode │ │ Heuristic Engine │ │ Config Engine │
│ (Model, etc..)│ │ NLP + Dictionary │ │ Policy Config │
└───────┬───────┘ └────────┬─────────┘ └──────┬────────┘
│ │ │
└────┬───────────────┴──────┬─────────────┘
▼ ▼
Hint Spans Heuristic Spans
│ │
└────┬────────┬───────┘
▼ ▼
Span Disambiguation Service
│
▼
Filter + Sort + Confidence Merge
│
▼
┌─────────────────────────────┐
│ Final Spans + Redaction │
└─────────────────────────────┘
"aiModel": {
"enabled": true,
"hintProcessingMode": "HYBRID",
"classificationToHintService": {
"FIRST_NAME": "first_name",
"EMAIL_ADDRESS": "email",
"SSN": "ssn"
}
}
Default (fallback) mode is HYBRID if not defined in JSON policy.
confidence score (AI-provided or default).applied spans are those with the highest confidence and no structural conflict.| Use Case | Recommended Mode |
|---|---|
| Domain-specific, AI-trained model | AI_ONLY |
| Regulated data pipelines | HEURISTIC_ONLY |
| Enterprise general-purpose usage | HYBRID (default) |
Given input:
"John Doe’s email is john@faasera.ai and SSN is 123-45-6789."
Output (combined spans):
[
{
"text": "John",
"label": "FIRST_NAME",
"confidence": 0.76
},
{
"text": "john@faasera.ai",
"label": "EMAIL_ADDRESS",
"confidence": 0.99
},
{
"text": "123-45-6789",
"label": "SSN",
"confidence": 0.95
}
]
| Component | Purpose |
|---|---|
HintService |
Sends text to AI service and fetches hint labels |
HintMapper |
Maps AI labels to internal RecognizerTypes |
Unstructured Profiler |
Merges spans based on chosen mode and redacts text |
Need help fine-tuning your profiler pipeline? Reach out at www.faasera.ai