ADR 0034: Comprehensive Error Handling Infrastructure¶

Status: Accepted
Date: 2025-11-22
Deciders: Development Team
Related ADRs: adr0016, adr0017

Context¶

The concept-rag project initially had basic error handling with simple error messages and limited structure. As the system grew in complexity and the number of integration points increased, the need for structured, informative, and programmatically-handleable errors became critical. Several issues emerged:

Inconsistent Error Handling: Different modules threw errors in different ways, making it hard to handle errors consistently
Limited Context: Simple error messages provided insufficient information for debugging
No Programmatic Handling: Lack of error codes made it impossible to handle specific error types programmatically
Poor User Experience: Validation errors didn't guide users on how to fix input
Missing Error Propagation: Errors lost context as they propagated up the stack
No Retry Logic: Transient failures (rate limits, network issues) always failed immediately

These limitations affected debugging time, user experience, and system reliability.

Decision¶

Implement a comprehensive error handling infrastructure with:

1. Structured Exception Hierarchy¶

Create a base ConceptRAGError class with rich context:

export abstract class ConceptRAGError extends Error {
  public readonly code: string;          // e.g., "VALIDATION_TEXT_INVALID"
  public readonly context: Record<string, unknown>;  // Additional details
  public readonly timestamp: Date;
  public readonly cause?: Error;         // Original error if wrapping

  constructor(message: string, code: string, context = {}, cause?: Error) {
    super(message);
    this.name = this.constructor.name;
    this.code = code;
    this.context = context;
    this.timestamp = new Date();
    this.cause = cause;
    Error.captureStackTrace(this, this.constructor);
  }

  toJSON() {
    return {
      name: this.name,
      message: this.message,
      code: this.code,
      context: this.context,
      timestamp: this.timestamp.toISOString(),
      cause: this.cause?.message
    };
  }
}

2. Domain-Specific Error Categories¶

Create specialized error types for different failure modes:

ValidationError: Input validation failures (RequiredFieldError, ValueOutOfRangeError, InvalidFormatError)
DatabaseError: Database operation failures (RecordNotFoundError, ConnectionError, TransactionError)
EmbeddingError: Embedding provider failures (EmbeddingProviderError, RateLimitError, InvalidDimensionsError)
SearchError: Search operation failures (InvalidQueryError, SearchTimeoutError, NoResultsError)
ConfigurationError: Configuration issues (MissingConfigError, InvalidConfigError)
DocumentError: Document processing failures (UnsupportedFormatError, DocumentParseError, DocumentTooLargeError)

3. Input Validation Layer¶

Create InputValidator service to validate at system boundaries:

export class InputValidator {
  validateSearchQuery(params: SearchQueryParams): void {
    if (!params.text || params.text.trim().length === 0) {
      throw new RequiredFieldError('text');
    }
    if (params.text.length > 10000) {
      throw new ValueOutOfRangeError('text', params.text.length, 1, 10000);
    }
  }
}

All MCP tools validate input before executing operations.

4. Error Wrapping Pattern¶

Repositories wrap infrastructure errors with domain context:

async findByName(name: string): Promise<Concept> {
  try {
    return await this.db.query(name);
  } catch (error) {
    if (error instanceof DatabaseError) {
      throw error;  // Re-throw domain errors
    }
    // Wrap infrastructure errors with context
    throw new DatabaseError(
      'Failed to retrieve concept',
      'query',
      error as Error
    );
  }
}

5. Retry Logic with Exponential Backoff¶

Implement retry service for transient errors:

export class RetryService {
  async withRetry<T>(
    operation: () => Promise<T>,
    options: RetryOptions = {}
  ): Promise<T> {
    const { maxRetries = 3, backoffMs = 1000 } = options;
    let lastError: Error | undefined;

    for (let attempt = 0; attempt < maxRetries; attempt++) {
      try {
        return await operation();
      } catch (error) {
        lastError = error as Error;

        // Don't retry validation errors
        if (error instanceof ValidationError) {
          throw error;
        }

        // Exponential backoff for retryable errors
        if (this.isRetryable(error)) {
          await this.sleep(backoffMs * Math.pow(2, attempt));
          continue;
        }

        throw error;
      }
    }

    throw lastError!;
  }
}

6. JSDoc Documentation¶

Document all public methods with @throws tags:

/**
 * Find a concept by name.
 * @param name - The concept name (case-sensitive)
 * @returns The concept if found
 * @throws {RecordNotFoundError} If concept does not exist
 * @throws {DatabaseError} If database query fails
 */
async findByName(name: string): Promise<Concept>

Implementation¶

Date: 2025-11-22
Pull Request: #12 (merged)
Time: ~3 hours agentic implementation

Components Created¶

Domain Layer (src/domain/exceptions/): - base.ts - Base ConceptRAGError class - validation.ts - Validation error types (6 classes) - database.ts - Database error types (5 classes) - embedding.ts - Embedding error types (4 classes) - search.ts - Search error types (4 classes) - configuration.ts - Configuration error types (3 classes) - document.ts - Document error types (4 classes)

Domain Services (src/domain/services/validation/): - InputValidator.ts - Input validation service

Infrastructure (src/infrastructure/retry/): - retry-service.ts - Retry logic with exponential backoff

Changes Applied¶

Repositories Updated (12 methods): - LanceDBCatalogRepository - 4 methods with error wrapping - LanceDBCategoryRepository - 8 methods with error wrapping - LanceDBConceptRepository - JSDoc @throws added - LanceDBConnection - Connection error handling

Tools Updated (10 tools): - All MCP tools now use InputValidator - Consistent error handling via BaseTool.handleError() - Structured error responses with codes and context

Documentation: - Added @throws JSDoc to 3 domain services - Added @throws to all repository methods - Added @throws to database connection methods

Test Coverage¶

Tests Created/Updated: - 64 unit tests for error classes - 18 integration tests for error handling - 4 existing tests updated for structured errors

Results: - ✅ All 615 tests passing - ✅ Coverage: 76.51% statements, 68.87% branches - ✅ Domain exceptions: 100% coverage - ✅ Validation service: 90.62% coverage

Consequences¶

Positive¶

Better Debugging
Error codes identify issues programmatically
Context provides specific details (entity, value, operation)
Timestamps help correlate errors in logs
Cause chains preserve original errors
Improved User Experience
Structured errors are machine-readable
Validation errors guide users to fix input
Consistent format across all operations
Enhanced Reliability
Input validation prevents invalid operations
Retry logic handles transient failures
Error wrapping prevents information loss
Connection errors clearly identified
Better Maintainability
JSDoc documents error contracts
Consistent patterns across codebase
Clear separation of concerns
Easy to add new error types
Programmatic Error Handling
Error codes enable type-specific handling
Context enables conditional retry logic
Machine-readable error responses

Negative¶

Increased Verbosity
More code required for error handling
Multiple error classes to maintain
Mitigation: Clear patterns reduce cognitive load
Learning Curve
New developers must learn error hierarchy
Must understand when to wrap vs re-throw
Mitigation: Comprehensive documentation and examples
Test Complexity
Tests must verify error codes and context
More assertions required per test
Mitigation: Test utilities and helpers

Neutral¶

Breaking Changes: None - Error responses enhanced but backward compatible
Performance: Negligible overhead from error object creation
Dependencies: No new external dependencies

Alternatives Considered¶

1. Result Type Pattern (Rust-style)¶

Approach: Use Result<T, E> type instead of exceptions:

type Result<T, E> = { ok: true, value: T } | { ok: false, error: E };

async findByName(name: string): Promise<Result<Concept, DatabaseError>>

Pros: - Explicit error handling in type signatures - Forces consideration of error cases - No exception propagation

Cons: - Major breaking change to all APIs - Requires wrapping every operation - Less idiomatic in TypeScript/JavaScript - Would require massive refactoring

Decision: Rejected - Too disruptive for incremental improvement

2. Error Codes Only (No Exception Classes)¶

Approach: Use generic Error with error codes:

throw new Error('VALIDATION_TEXT_INVALID: Text is required');

Pros: - Simpler implementation - No class hierarchy to maintain - Lightweight

Cons: - No structured context - No type safety for error handling - Harder to extract error information - No inheritance for common behavior

Decision: Rejected - Insufficient structure and type safety

3. AOP-Style Error Interceptors¶

Approach: Use decorators/interceptors for automatic error handling:

@HandleErrors(DatabaseError)
async findByName(name: string): Promise<Concept>

Pros: - Less boilerplate in methods - Centralized error handling - Declarative style

Cons: - Requires decorators (experimental in TypeScript) - Less explicit about what errors are thrown - Harder to customize per method - Magic behavior harder to understand

Decision: Rejected - Too implicit, experimental features

4. Functional Error Handling Library (fp-ts)¶

Approach: Use Either/Option types from fp-ts:

import { Either, left, right } from 'fp-ts/Either';

async findByName(name: string): Promise<Either<DatabaseError, Concept>>

Pros: - Battle-tested functional patterns - Rich ecosystem of combinators - Type-safe error handling

Cons: - Large dependency (fp-ts) - Steep learning curve for team - Unfamiliar patterns in TypeScript - Major API changes required

Decision: Rejected - Too much cognitive overhead for benefits

Evidence¶

Implementation Artifacts¶

Planning Document: 04-error-handling-plan.md
Implementation Summary: IMPLEMENTATION-SUMMARY.md
Pull Request: #12 - https://github.com/m2ux/concept-rag/pull/12
Test Results: TEST-RESULTS.md

Commit History¶

ae39e9f feat: implement comprehensive error handling infrastructure
b62ee6b test: add integration tests for error handling
4d51d0e test: add comprehensive unit tests for error handling
598d74a feat: add retry service with exponential backoff
121beed refactor: update repositories to use new exception hierarchy
09802d5 feat: add input validation and structured error handling to MCP tools
cf0f846 feat: add comprehensive input validation service
afc21a7 feat: implement comprehensive exception hierarchy

Metrics¶

Files Changed: - 15 files modified - +413 insertions, -163 deletions

Test Coverage: - Domain exceptions: 100% - Validation service: 90.62% - Infrastructure search: 97.52% - Overall: 76.51% statements

Error Categories Created: - 7 error category modules - 26 specialized error classes - 1 base error class - 1 validation service - 1 retry service

Knowledge Base Sources¶

This decision was informed by: - "Programming Rust" - Error handling patterns - "Clean Architecture" - Error propagation across boundaries - Error handling best practices from Software Engineering category - Industry standards for exception hierarchies

adr0016 - Provides layered architecture for error boundaries
adr0017 - Repository pattern benefits from consistent error handling
adr0018 - DI enables injecting retry/error handling services
adr0033 - BaseTool provides error handling for all MCP tools

Future Considerations¶

Error Telemetry: Add OpenTelemetry integration for error tracking
Error Recovery Strategies: Implement circuit breaker pattern for failing services
Error Analytics: Track error frequency and patterns for system health monitoring
User-Facing Errors: Add i18n support for user-friendly error messages
Error Aggregation: Implement error aggregation for bulk operations

Notes¶

This ADR represents a significant maturation of the error handling infrastructure, moving from ad-hoc error messages to a comprehensive, structured system. The implementation was completed in a single focused effort with zero breaking changes and full test coverage.

The error hierarchy strikes a balance between simplicity and expressiveness, providing enough structure for programmatic handling while remaining intuitive for developers to use and extend.

References: - Implementation: planning - Pull Request: #12 - Test Coverage: 100% for error classes, 90.62% for validation service