Join our Live Workshop: Tuesdays & Thursdays, 10:00 – 11:00 AM IST Sign Up Now →

Glossary

Essential terms for understanding AI-powered compliance training

A

AI Hallucination

When an AI model generates content that sounds plausible but is factually incorrect or entirely fabricated. In compliance contexts, hallucinations are particularly dangerous because they can introduce false policy guidance that employees may follow. Research shows hallucination is an inherent property of large language models, not a bug that can be fully eliminated.

Related: The Black Box Problem

Audit Trail

A chronological record documenting the sequence of activities that have affected a specific operation, procedure, or event. In compliance training, audit trails prove that employees received specific training at specific times—critical for regulatory defense. SCORM-based systems create standardized audit trails.

Related: Why SCORM Still Matters
B

Bias (AI)

The presence of systematic errors or prejudices in an AI model's outputs, often reflecting biases in its training data. In compliance training, AI bias can lead to unfair or discriminatory guidance. Mitigating bias requires diverse, representative datasets and rigorous "human-in-the-loop" testing.

Bounded Knowledge

The principle of restricting an AI's operational domain to a specific, verified corpus of information. Unlike "general" AI that can discuss any topic, Bounded Knowledge systems only answer questions supported by your internal documentation, effectively preventing hallucinations about unverified topics.

Related: The Case for Boring AI
C

Chunking (Document)

The process of breaking large documents into smaller segments for processing by AI systems. In RAG architectures, source documents are typically chunked into 200-500 token segments that can be individually retrieved and referenced. Proper chunking preserves logical units and cross-references.

Related: RAG vs Fine-Tuning

Compliance Training

Mandatory education programs designed to ensure employees understand and follow laws, regulations, and organizational policies relevant to their roles. Compliance training is required across regulated industries including finance, healthcare, and manufacturing. Effectiveness is often measured by completion rates, though research suggests behavioral change is a better metric.

Related: Measuring Training Effectiveness

Confidence Score

A numerical measure indicating how certain an AI system is about its output. In RAG systems, confidence scores are derived from retrieval quality—how closely retrieved passages match the query. Low confidence scores should trigger human review rather than automated responses.

Content Aggregation Model (CAM)

The SCORM specification component that defines how learning content is packaged for delivery. CAM specifies the structure of the manifest file (imsmanifest.xml) and how content resources are organized within a SCORM package.

Cosine Similarity

A mathematical measure of similarity between two vectors, commonly used in RAG systems to compare query embeddings against document embeddings. Values range from -1 to 1, with higher values indicating greater similarity. Compliance-grade RAG systems typically require similarity thresholds of 0.85 or higher.

D

Deterministic Engine

An AI configuration that prioritizes predictable, consistent outputs over creative or varied ones. In compliance, determinism is critical: the same question asked by different employees should receive the same grounded answer, ensuring standardized policy distribution.

Related: The Zero-Hallucination Guarantee
E

Embedding

A numerical representation of text (or other data) as a vector in high-dimensional space. Embeddings capture semantic meaning, allowing AI systems to find conceptually similar content even when exact words differ. Modern embedding models produce vectors with 768-1536 dimensions.

Related: RAG vs Fine-Tuning

Epistemic Humility

The quality of recognizing and acknowledging the limits of one's knowledge. In AI systems, epistemic humility means the ability to say "I don't know" when confidence is low. Generic AI lacks this quality; compliance-grade AI requires it.

Related: The Black Box Problem
F

Fine-Tuning

The process of further training a pre-trained AI model on domain-specific data. Fine-tuning adjusts model weights to incorporate new knowledge but creates provenance and auditability challenges. For compliance-critical content, RAG is generally preferred over fine-tuning.

Related: RAG vs Fine-Tuning
G

Grounded Generation

AI text generation that is constrained to information from specific source documents rather than the model's general training data. Grounding prevents hallucination by ensuring outputs can be traced to authoritative sources.

H

HIPAA (Health Insurance Portability and Accountability Act)

U.S. legislation establishing standards for protecting sensitive patient health information. HIPAA compliance training is mandatory for healthcare organizations and business associates handling protected health information (PHI).

Human-in-the-Loop (HITL)

The practice of integrating human oversight into an AI system's workflow. This includes humans reviewing AI training data, auditing "low confidence" AI responses, and refining the source documentation that grounds the AI's knowledge.

I

Inference

The process by which a trained AI model provides an output (answer) based on a given input (question). In RAG systems, inference occurs after the relevant context has been retrieved and attached to the query.

K

Kirkpatrick Model

A four-level framework for evaluating training effectiveness: (1) Reaction—learner satisfaction, (2) Learning—knowledge acquisition, (3) Behavior—on-the-job application, (4) Results—organizational impact. Most compliance training only measures Levels 1-2, missing the more important Levels 3-4.

Related: Measuring Training Effectiveness

Knowledge Firewall

A architectural boundary that prevents AI systems from accessing or generating content beyond a defined document corpus. In compliance AI, knowledge firewalls ensure the system only discusses what exists in your approved documentation.

L

Large Language Model (LLM)

AI models trained on vast text datasets that can generate human-like text. Examples include GPT-4, Claude, and Llama. LLMs are powerful but prone to hallucination when used without retrieval augmentation.

Learning Management System (LMS)

Software platform for delivering, tracking, and managing training content. LMS systems track completion, assessment scores, and time spent—data essential for compliance documentation. Most enterprise LMS platforms support SCORM.

Learning Record Store (LRS)

A data repository for storing learning activity statements in xAPI format. Unlike LMS completion records, LRS can capture diverse learning experiences across platforms. Required for xAPI implementations.

M

Manifest File (imsmanifest.xml)

The XML file at the core of every SCORM package that describes the package contents, structure, and metadata. The manifest tells the LMS how to launch and track the content.

Microlearning

Training delivered in small, focused segments (typically 3-7 minutes). Microlearning improves completion rates and retention compared to longer courses. SCORM supports microlearning through individual SCO tracking.

N

Non-Parametric Memory

In AI architecture, external knowledge stored outside the model's weights—typically in a vector database. RAG systems combine parametric memory (model weights) with non-parametric memory (retrieved documents) for more accurate, verifiable outputs.

P

Parametric Memory

Knowledge stored in an AI model's trained weights. Parametric memory cannot be easily updated, audited, or traced to specific sources—making it problematic for compliance applications.

Provenance

The documented origin and history of information. In compliance AI, provenance tracking connects every generated claim to specific source documents, enabling audit and verification.

Related: RAG vs Fine-Tuning
R

RAG (Retrieval-Augmented Generation)

An AI architecture that retrieves relevant documents before generating responses, grounding outputs in specific source material. RAG enables provenance tracking, knowledge updates without retraining, and confidence-aware generation. Introduced by Lewis et al. (2020) at Facebook AI Research.

Related: RAG vs Fine-Tuning

Retrieval Quality

A measure of how well retrieved documents match a query's intent. High retrieval quality produces relevant context for generation; low retrieval quality should trigger uncertainty flags rather than forced generation.

S

SCORM (Sharable Content Object Reference Model)

A set of technical standards for e-learning content packaging and LMS communication. Developed by the U.S. Department of Defense's ADL Initiative, SCORM enables interoperability between content and platforms. SCORM 1.2 (2001) and SCORM 2004 are widely supported.

Related: Why SCORM Still Matters

SCO (Sharable Content Object)

A discrete unit of learning content in SCORM—the smallest trackable element. Each SCO can report completion status, scores, and other data to the LMS independently.

SOC 2 (Service Organization Control 2)

An auditing framework for service organizations covering security, availability, processing integrity, confidentiality, and privacy. SOC 2 Type II certification demonstrates ongoing compliance with security controls.

T

Token

The basic unit of text processing in language models. Tokens are typically word pieces (common words are single tokens; rare words split into multiple tokens). GPT-4 uses roughly 1 token per 4 characters of English text.

Training Decay

The decline in knowledge retention and behavioral impact over time following training. Research shows compliance training effects can decay within 60-90 days, requiring ongoing reinforcement rather than annual refreshers.

Related: Measuring Training Effectiveness
V

Vector Database

A specialized database optimized for storing and searching high-dimensional vectors (embeddings). Vector databases enable fast similarity search across millions of document chunks, essential for production RAG systems.

Vector Embedding

See: Embedding

X

xAPI (Experience API)

A learning technology specification (IEEE 9274.1.1) enabling tracking of diverse learning experiences beyond traditional LMS courses. xAPI uses "Actor-Verb-Object" statements stored in a Learning Record Store. More flexible than SCORM but less standardized for compliance documentation.

Related: Why SCORM Still Matters

Need More Help?

Can't find a term? Have questions about compliance AI?

This glossary is maintained by Episteca.ai. Last updated: January 2025.