Faasera Documentation

Faasera AI Profiler — Hint Processing Modes

Faasera Profiler supports advanced techniques to detect, classify, and protect sensitive data in unstructured text. You can configure the Profiler to use AI-driven entity hinting, heuristic recognition, or a hybrid of both approaches.

This guide explains the available hint processing modes, their configuration, and how Faasera blends multiple techniques like dictionary matching, NLP models, and checksum validation to enrich entity recognition.


Hint Processing Modes

Faasera supports the following modes under the aiModel.hintProcessingMode setting:

Mode Description
AI_ONLY Use only AI hint service results. Disables all heuristic detection.
HEURISTIC_ONLY Use only heuristic recognizers (dictionary, NLP, checksum, etc.).
HYBRID (default) Combine both AI and heuristic results, merging them with disambiguation.

AI Hinting (FaaseraAI Hint Service)

When enabled, Faasera sends selected text values to an external Hinting Service (e.g., GLiNER) via HTTP POST. The service returns structured entity spans:

[
  {
    "start": 0,
    "end": 5,
    "text": "John",
    "label": "first_name",
    "score": 0.7638
  }
]

These AI-detected spans are converted into internal Span objects with configurable mapping to RecognizerType via:

"classificationToHintService": {
"FIRST_NAME": "first_name",
"EMAIL_ADDRESS": "email",
"SSN": "ssn"
}

The confidence value is preserved (rounded to 2 decimals) for sorting and disambiguation.


What is Heuristic Recognition?

Heuristic-based recognition is a rules-based approach for identifying sensitive entities in text. Unlike AI models that rely on statistical learning, heuristics use predefined logic, making them fast, transparent, and * *highly customizable**. This approach is ideal for regulated environments that require deterministic behavior and auditability.

Faasera Profiler implements several heuristic techniques:


1. NLP Models (Apache OpenNLP)

Faasera leverages pre-trained NLP models to detect structured entities in free-form text.

Entity Type Example Text Detected Entity
Person Name “My name is Alice Carter” Alice Carter
Location “I live in New York” New York

These models analyze linguistic patterns (tokens, POS tags) to infer the type of entity.


2. Dictionary-Based Matching

Faasera supports custom term dictionaries that define known sensitive keywords or entities.

Example Dictionary:

{
  "COUNTRY_NAMES": [
    "Australia",
    "Canada",
    "Japan"
  ],
  "HEALTH_TERMS": [
    "asthma",
    "diabetes",
    "HIV"
  ]
}

Example Input:

"The patient was diagnosed with diabetes in Australia."

Output:

Dictionaries are ideal for domain-specific use cases like healthcare, finance, or law.


3. Keyword-Based Recognition

You can define keywords or token triggers that imply sensitivity without requiring an entity match.

Examples:

This rule-based method helps catch context-sensitive values.


4. Checksum-Based Validation

Faasera supports structural validation using checksums and patterns. These techniques go beyond detection — they * *verify** the integrity of the format.

Type Example Input Detection Logic
Credit Card 4111 1111 1111 1111 Luhn Algorithm (valid)
National ID 987654321 Pattern + checksum
Tax Numbers TFN 123 456 782 Modulus-based checksum

This ensures that detected entities are valid, not just pattern-matched.


5. Regex-Based Matching (Optional)

Advanced users may define custom regular expressions to capture specific formats.

Example Regex:

[A-Z]{3}-\d{4}

This pattern matches strings like ABC-1234. Useful for internal identifiers or industry-specific formats.


Why Use Heuristics?

Benefit Description
Speed No external API calls or models required
Predictability Deterministic logic — same input yields same result
Auditability Easy to explain and document logic for compliance
Customizability Users can create dictionaries, keywords, or regex rules
Fallback Ready Works well as a baseline or fallback to AI-based recognition

Heuristics form the foundation of Faasera’s detection engine and can be used independently or combined with AI hinting to balance precision and performance across varied data environments.

Workflow

               ┌───────────────────────────────┐
               │  Raw Text Input from User     │
               └─────────────┬─────────────────┘
                             │
                   ┌─────────▼─────────────┐
                   │  Faasera Profiler     │
                   │  (Unstructured Mode)  │
                   └─────────┬─────────────┘
                             │
        ┌────────────────────┼────────────────────┐
        │                    │                    │
┌───────▼───────┐   ┌────────▼─────────┐   ┌──────▼────────┐
│ AI Hint Mode  │   │ Heuristic Engine │   │ Config Engine │
│ (Model, etc..)│   │ NLP + Dictionary │   │ Policy Config │
└───────┬───────┘   └────────┬─────────┘   └──────┬────────┘
        │                    │                    │
        └────┬───────────────┴──────┬─────────────┘
             ▼                     ▼
         Hint Spans         Heuristic Spans
             │                     │
             └────┬────────┬───────┘
                  ▼        ▼
           Span Disambiguation Service
                  │
                  ▼
        Filter + Sort + Confidence Merge
                  │
                  ▼
        ┌─────────────────────────────┐
        │   Final Spans + Redaction   │
        └─────────────────────────────┘

How to Configure

In Your Policy JSON

"aiModel": {
"enabled": true,
"hintProcessingMode": "HYBRID",
"classificationToHintService": {
"FIRST_NAME": "first_name",
"EMAIL_ADDRESS": "email",
"SSN": "ssn"
}
}

In faasera.config.properties file

Default (fallback) mode is HYBRID if not defined in JSON policy.


Confidence & Overlap Handling


Use Case Scenarios

Use Case Recommended Mode
Domain-specific, AI-trained model AI_ONLY
Regulated data pipelines HEURISTIC_ONLY
Enterprise general-purpose usage HYBRID (default)

Output Example (HYBRID)

Given input:

"John Doe’s email is john@faasera.ai and SSN is 123-45-6789."

Output (combined spans):

[
  {
    "text": "John",
    "label": "FIRST_NAME",
    "confidence": 0.76
  },
  {
    "text": "john@faasera.ai",
    "label": "EMAIL_ADDRESS",
    "confidence": 0.99
  },
  {
    "text": "123-45-6789",
    "label": "SSN",
    "confidence": 0.95
  }
]

Summary

Component Purpose
HintService Sends text to AI service and fetches hint labels
HintMapper Maps AI labels to internal RecognizerTypes
Unstructured Profiler Merges spans based on chosen mode and redacts text

Final Notes


Need help fine-tuning your profiler pipeline? Reach out at www.faasera.ai