Skip to content

ADR 0034: Comprehensive Error Handling Infrastructure

Status: Accepted
Date: 2025-11-22
Deciders: Development Team
Related ADRs: adr0016, adr0017

Context

The concept-rag project initially had basic error handling with simple error messages and limited structure. As the system grew in complexity and the number of integration points increased, the need for structured, informative, and programmatically-handleable errors became critical. Several issues emerged:

  1. Inconsistent Error Handling: Different modules threw errors in different ways, making it hard to handle errors consistently
  2. Limited Context: Simple error messages provided insufficient information for debugging
  3. No Programmatic Handling: Lack of error codes made it impossible to handle specific error types programmatically
  4. Poor User Experience: Validation errors didn't guide users on how to fix input
  5. Missing Error Propagation: Errors lost context as they propagated up the stack
  6. No Retry Logic: Transient failures (rate limits, network issues) always failed immediately

These limitations affected debugging time, user experience, and system reliability.

Decision

Implement a comprehensive error handling infrastructure with:

1. Structured Exception Hierarchy

Create a base ConceptRAGError class with rich context:

export abstract class ConceptRAGError extends Error {
  public readonly code: string;          // e.g., "VALIDATION_TEXT_INVALID"
  public readonly context: Record<string, unknown>;  // Additional details
  public readonly timestamp: Date;
  public readonly cause?: Error;         // Original error if wrapping

  constructor(message: string, code: string, context = {}, cause?: Error) {
    super(message);
    this.name = this.constructor.name;
    this.code = code;
    this.context = context;
    this.timestamp = new Date();
    this.cause = cause;
    Error.captureStackTrace(this, this.constructor);
  }

  toJSON() {
    return {
      name: this.name,
      message: this.message,
      code: this.code,
      context: this.context,
      timestamp: this.timestamp.toISOString(),
      cause: this.cause?.message
    };
  }
}

2. Domain-Specific Error Categories

Create specialized error types for different failure modes:

  • ValidationError: Input validation failures (RequiredFieldError, ValueOutOfRangeError, InvalidFormatError)
  • DatabaseError: Database operation failures (RecordNotFoundError, ConnectionError, TransactionError)
  • EmbeddingError: Embedding provider failures (EmbeddingProviderError, RateLimitError, InvalidDimensionsError)
  • SearchError: Search operation failures (InvalidQueryError, SearchTimeoutError, NoResultsError)
  • ConfigurationError: Configuration issues (MissingConfigError, InvalidConfigError)
  • DocumentError: Document processing failures (UnsupportedFormatError, DocumentParseError, DocumentTooLargeError)

3. Input Validation Layer

Create InputValidator service to validate at system boundaries:

export class InputValidator {
  validateSearchQuery(params: SearchQueryParams): void {
    if (!params.text || params.text.trim().length === 0) {
      throw new RequiredFieldError('text');
    }
    if (params.text.length > 10000) {
      throw new ValueOutOfRangeError('text', params.text.length, 1, 10000);
    }
  }
}

All MCP tools validate input before executing operations.

4. Error Wrapping Pattern

Repositories wrap infrastructure errors with domain context:

async findByName(name: string): Promise<Concept> {
  try {
    return await this.db.query(name);
  } catch (error) {
    if (error instanceof DatabaseError) {
      throw error;  // Re-throw domain errors
    }
    // Wrap infrastructure errors with context
    throw new DatabaseError(
      'Failed to retrieve concept',
      'query',
      error as Error
    );
  }
}

5. Retry Logic with Exponential Backoff

Implement retry service for transient errors:

export class RetryService {
  async withRetry<T>(
    operation: () => Promise<T>,
    options: RetryOptions = {}
  ): Promise<T> {
    const { maxRetries = 3, backoffMs = 1000 } = options;
    let lastError: Error | undefined;

    for (let attempt = 0; attempt < maxRetries; attempt++) {
      try {
        return await operation();
      } catch (error) {
        lastError = error as Error;

        // Don't retry validation errors
        if (error instanceof ValidationError) {
          throw error;
        }

        // Exponential backoff for retryable errors
        if (this.isRetryable(error)) {
          await this.sleep(backoffMs * Math.pow(2, attempt));
          continue;
        }

        throw error;
      }
    }

    throw lastError!;
  }
}

6. JSDoc Documentation

Document all public methods with @throws tags:

/**
 * Find a concept by name.
 * @param name - The concept name (case-sensitive)
 * @returns The concept if found
 * @throws {RecordNotFoundError} If concept does not exist
 * @throws {DatabaseError} If database query fails
 */
async findByName(name: string): Promise<Concept>

Implementation

Date: 2025-11-22
Pull Request: #12 (merged)
Time: ~3 hours agentic implementation

Components Created

Domain Layer (src/domain/exceptions/): - base.ts - Base ConceptRAGError class - validation.ts - Validation error types (6 classes) - database.ts - Database error types (5 classes) - embedding.ts - Embedding error types (4 classes) - search.ts - Search error types (4 classes) - configuration.ts - Configuration error types (3 classes) - document.ts - Document error types (4 classes)

Domain Services (src/domain/services/validation/): - InputValidator.ts - Input validation service

Infrastructure (src/infrastructure/retry/): - retry-service.ts - Retry logic with exponential backoff

Changes Applied

Repositories Updated (12 methods): - LanceDBCatalogRepository - 4 methods with error wrapping - LanceDBCategoryRepository - 8 methods with error wrapping - LanceDBConceptRepository - JSDoc @throws added - LanceDBConnection - Connection error handling

Tools Updated (10 tools): - All MCP tools now use InputValidator - Consistent error handling via BaseTool.handleError() - Structured error responses with codes and context

Documentation: - Added @throws JSDoc to 3 domain services - Added @throws to all repository methods - Added @throws to database connection methods

Test Coverage

Tests Created/Updated: - 64 unit tests for error classes - 18 integration tests for error handling - 4 existing tests updated for structured errors

Results: - ✅ All 615 tests passing - ✅ Coverage: 76.51% statements, 68.87% branches - ✅ Domain exceptions: 100% coverage - ✅ Validation service: 90.62% coverage

Consequences

Positive

  1. Better Debugging
  2. Error codes identify issues programmatically
  3. Context provides specific details (entity, value, operation)
  4. Timestamps help correlate errors in logs
  5. Cause chains preserve original errors

  6. Improved User Experience

  7. Structured errors are machine-readable
  8. Validation errors guide users to fix input
  9. Consistent format across all operations

  10. Enhanced Reliability

  11. Input validation prevents invalid operations
  12. Retry logic handles transient failures
  13. Error wrapping prevents information loss
  14. Connection errors clearly identified

  15. Better Maintainability

  16. JSDoc documents error contracts
  17. Consistent patterns across codebase
  18. Clear separation of concerns
  19. Easy to add new error types

  20. Programmatic Error Handling

  21. Error codes enable type-specific handling
  22. Context enables conditional retry logic
  23. Machine-readable error responses

Negative

  1. Increased Verbosity
  2. More code required for error handling
  3. Multiple error classes to maintain
  4. Mitigation: Clear patterns reduce cognitive load

  5. Learning Curve

  6. New developers must learn error hierarchy
  7. Must understand when to wrap vs re-throw
  8. Mitigation: Comprehensive documentation and examples

  9. Test Complexity

  10. Tests must verify error codes and context
  11. More assertions required per test
  12. Mitigation: Test utilities and helpers

Neutral

  1. Breaking Changes: None - Error responses enhanced but backward compatible
  2. Performance: Negligible overhead from error object creation
  3. Dependencies: No new external dependencies

Alternatives Considered

1. Result Type Pattern (Rust-style)

Approach: Use Result<T, E> type instead of exceptions:

type Result<T, E> = { ok: true, value: T } | { ok: false, error: E };

async findByName(name: string): Promise<Result<Concept, DatabaseError>>

Pros: - Explicit error handling in type signatures - Forces consideration of error cases - No exception propagation

Cons: - Major breaking change to all APIs - Requires wrapping every operation - Less idiomatic in TypeScript/JavaScript - Would require massive refactoring

Decision: Rejected - Too disruptive for incremental improvement

2. Error Codes Only (No Exception Classes)

Approach: Use generic Error with error codes:

throw new Error('VALIDATION_TEXT_INVALID: Text is required');

Pros: - Simpler implementation - No class hierarchy to maintain - Lightweight

Cons: - No structured context - No type safety for error handling - Harder to extract error information - No inheritance for common behavior

Decision: Rejected - Insufficient structure and type safety

3. AOP-Style Error Interceptors

Approach: Use decorators/interceptors for automatic error handling:

@HandleErrors(DatabaseError)
async findByName(name: string): Promise<Concept>

Pros: - Less boilerplate in methods - Centralized error handling - Declarative style

Cons: - Requires decorators (experimental in TypeScript) - Less explicit about what errors are thrown - Harder to customize per method - Magic behavior harder to understand

Decision: Rejected - Too implicit, experimental features

4. Functional Error Handling Library (fp-ts)

Approach: Use Either/Option types from fp-ts:

import { Either, left, right } from 'fp-ts/Either';

async findByName(name: string): Promise<Either<DatabaseError, Concept>>

Pros: - Battle-tested functional patterns - Rich ecosystem of combinators - Type-safe error handling

Cons: - Large dependency (fp-ts) - Steep learning curve for team - Unfamiliar patterns in TypeScript - Major API changes required

Decision: Rejected - Too much cognitive overhead for benefits

Evidence

Implementation Artifacts

  1. Planning Document: 04-error-handling-plan.md
  2. Implementation Summary: IMPLEMENTATION-SUMMARY.md
  3. Pull Request: #12 - https://github.com/m2ux/concept-rag/pull/12
  4. Test Results: TEST-RESULTS.md

Commit History

ae39e9f feat: implement comprehensive error handling infrastructure
b62ee6b test: add integration tests for error handling
4d51d0e test: add comprehensive unit tests for error handling
598d74a feat: add retry service with exponential backoff
121beed refactor: update repositories to use new exception hierarchy
09802d5 feat: add input validation and structured error handling to MCP tools
cf0f846 feat: add comprehensive input validation service
afc21a7 feat: implement comprehensive exception hierarchy

Metrics

Files Changed: - 15 files modified - +413 insertions, -163 deletions

Test Coverage: - Domain exceptions: 100% - Validation service: 90.62% - Infrastructure search: 97.52% - Overall: 76.51% statements

Error Categories Created: - 7 error category modules - 26 specialized error classes - 1 base error class - 1 validation service - 1 retry service

Knowledge Base Sources

This decision was informed by: - "Programming Rust" - Error handling patterns - "Clean Architecture" - Error propagation across boundaries - Error handling best practices from Software Engineering category - Industry standards for exception hierarchies

  • adr0016 - Provides layered architecture for error boundaries
  • adr0017 - Repository pattern benefits from consistent error handling
  • adr0018 - DI enables injecting retry/error handling services
  • adr0033 - BaseTool provides error handling for all MCP tools

Future Considerations

  1. Error Telemetry: Add OpenTelemetry integration for error tracking
  2. Error Recovery Strategies: Implement circuit breaker pattern for failing services
  3. Error Analytics: Track error frequency and patterns for system health monitoring
  4. User-Facing Errors: Add i18n support for user-friendly error messages
  5. Error Aggregation: Implement error aggregation for bulk operations

Notes

This ADR represents a significant maturation of the error handling infrastructure, moving from ad-hoc error messages to a comprehensive, structured system. The implementation was completed in a single focused effort with zero breaking changes and full test coverage.

The error hierarchy strikes a balance between simplicity and expressiveness, providing enough structure for programmatic handling while remaining intuitive for developers to use and extend.


References: - Implementation: planning - Pull Request: #12 - Test Coverage: 100% for error classes, 90.62% for validation service