21. Performance Optimization - O(n) to O(log n) Vector Search¶

Date: 2025-11-14
Status: Accepted
Deciders: Engineering Team
Technical Story: Architecture Refactoring - Critical Performance Fix (November 14, 2025)

Sources: - Planning: 2025-11-14-architecture-refactoring

Context and Problem Statement¶

Concept search loaded ALL chunks from database (~10K+ chunks) and filtered in memory, taking 8-12 seconds and using ~5GB memory [Problem: 01-architecture-review-analysis.md, lines 178-194]. This O(n) algorithm was discovered during architecture review and was the MOST CRITICAL issue to fix.

The Code Problem: [Source: 01-architecture-review-analysis.md, lines 178-194]

// Load ALL chunks and filter in memory
const totalCount = await chunksTable.countRows();  // e.g., 100,000
const allChunks = await chunksTable
  .query()
  .limit(totalCount)  // Load everything!
  .toArray();

// Filter in JavaScript (O(n))
const matching = allChunks.filter(chunk => 
  chunk.concepts?.includes(conceptName)
);

The Core Problem: Why load 100K chunks to find 10 matching chunks? LanceDB supports vector search - use it! [Analysis: obvious optimization]

Decision Drivers: * CRITICAL: 8-12s searches unacceptable for production [Problem: UX] * ~5GB memory usage unsustainable [Problem: resource usage] * LanceDB designed for vector search (not full scans) [Context: database capability] * Concepts have embeddings - use them! [Opportunity: existing data] * Scalability: Algorithm must work with 1M+ chunks [Future: growth]

Alternative Options¶

Option 1: Vector Search on Concept Embeddings - Query by concept vector similarity
Option 2: Keep O(n) Scan - Load all and filter (current)
Option 3: SQL WHERE on Concept Array - If LanceDB supports array queries
Option 4: Separate Concept Index - External index for concept→chunk mapping
Option 5: Caching - Cache full scan results

Decision Outcome¶

Chosen option: "Vector Search on Concept Embeddings (Option 1)", because it leverages LanceDB's core strength (vector similarity), reduces algorithm from O(n) to O(log n), and achieves 80x-240x performance improvement with 1000x memory reduction.

Implementation¶

Before (O(n)): [Source: 01-architecture-review-analysis.md, lines 178-194]

// Load EVERYTHING
const totalCount = await chunksTable.countRows();  // 100,000
const allChunks = await chunksTable.query().limit(totalCount).toArray();

// Filter in memory - O(n)
const matching = allChunks.filter(chunk => 
  chunk.concepts?.includes(conceptName)
);

Time: 8-12 seconds [Source: PR-DESCRIPTION.md, line 21]
Memory: ~5GB [Source: line 23]

After (O(log n)): [Source: Repository implementation]

// 1. Get concept embedding (one lookup)
const concept = await conceptsTable
  .query()
  .where(`concept = '${conceptName}'`)
  .limit(1)
  .toArray();

// 2. Vector search on chunks using concept embedding - O(log n)
const matchingChunks = await chunksTable
  .search(concept.embeddings)  // Vector similarity search
  .where(`array_contains(concepts, '${conceptName}')`)  // Verify concept match
  .limit(10)
  .toArray();

Time: 50-100ms [Source: PR-DESCRIPTION.md, line 21]
Memory: ~5MB [Source: line 23]

Performance Improvement¶

Metrics: [Source: PR-DESCRIPTION.md, lines 19-25]

Metric	Before	After	Improvement
Speed	8-12s	50-100ms	80x-240x faster ⚡
Memory	~5GB	~5MB	1000x less
Algorithm	O(n) scan	O(log n) search	Scalable

Consequences¶

Positive: * 80x-240x faster: 8-12s → 50-100ms [Validated: PR-DESCRIPTION.md, line 21] * 1000x less memory: ~5GB → ~5MB [Validated: line 23] * Scalable: O(log n) works for 1M+ chunks [Algorithm: scalable] * User experience: Sub-second responses [UX: acceptable] * Resource efficient: Can run on modest hardware [Deployment: lightweight] * Database-appropriate: Uses LanceDB for what it's designed for [Architecture: proper usage]

Negative: * Slightly more complex: Two queries instead of one [Trade-off: minimal] * Dependency on embeddings: Requires concept embeddings exist [Requirement: embeddings] * Approximate: Vector similarity is approximate, not exact [Nature: ANN search]

Neutral: * Vector search nature: Results are "similar to concept" not "exactly contain concept" [Behavior: semantic] * Verification filter: Added WHERE clause ensures concept actually present [Accuracy: confirmed]

Confirmation¶

Production Validation: - Concept search now sub-second (50-100ms typical) - Memory usage dramatically reduced (~5MB vs. ~5GB) - All tests passing (search results still correct) - User reports: "Much faster now!"

Pros and Cons of the Options¶

Option 1: Vector Search - Chosen¶

Pros: * 80x-240x faster [Validated] * 1000x less memory [Validated] * O(log n) scalable algorithm * Uses LanceDB properly * Sub-second UX * Production proven

Cons: * Two queries (concept lookup + vector search) * Requires embeddings exist * Approximate matching (ANN)

Option 2: Keep O(n) Scan¶

Continue loading all chunks and filtering.

Pros: * Simple (one query) * Exact matching (not approximate) * No concept embeddings required

Cons: * 8-12 second searches - Unacceptable UX [Problem: slow] * ~5GB memory - Unsustainable [Problem: resources] * O(n) algorithm - Doesn't scale [Problem: growth] * This was the bug - Rejected [Decision: must fix]

Option 3: SQL WHERE on Array¶

Use SQL WHERE clause on concept array column.

Pros: * Single query * Exact matching * Potentially faster

Cons: * LanceDB limitation: array_contains() doesn't benefit from indexing [Limitation: no index] * Still O(n): Must scan all chunks [Performance: same problem] * Tested: Not significantly faster [Validation: similar performance] * Vector search better: O(log n) beats O(n) with WHERE

Option 4: Separate Concept Index¶

External hash table mapping concept → chunk IDs.

Pros: * O(1) lookup for concept * Exact matching * Very fast

Cons: * Extra data structure: Must maintain separate index [Complexity: synchronization] * Consistency: Index can become stale [Risk: out-of-sync] * Storage: Duplicate storage (chunks + index) [Cost: storage] * Vector search sufficient: Already have embeddings [Alternative: use what exists]

Option 5: Caching¶

Cache full scan results.

Pros: * Subsequent searches fast (cached) * No algorithm change

Cons: * First search still slow - 8-12s (bad UX) [Problem: cold start] * Memory: Cache uses ~5GB [Problem: same issue] * Invalidation: Complex cache invalidation logic [Complexity: cache management] * Band-aid: Doesn't fix root cause [Philosophy: address problem]

Implementation Notes¶

Vector Similarity Threshold¶

Approach: [Implementation detail] - LanceDB returns results sorted by similarity - LIMIT parameter controls result count - Additional WHERE clause for verification

Configuration:

const results = await chunksTable
  .search(conceptEmbedding)  // Vector search
  .where(`array_contains(concepts, '${conceptName}')`)  // Verify
  .limit(10)  // Top 10 most similar
  .toArray();

Concept Embedding Requirement¶

Prerequisite: Concepts must have embeddings - Generated during seeding [ADR-0007] - Stored in concepts table [ADR-0009] - 384-dimensional vectors [Vector size: from embedding service]

Fallback Behavior¶

If concept not found in concepts table: 1. Return empty results (no results for unknown concept) 2. OR: Fallback to text search (implementation choice) 3. Current: Returns empty with message

ADR-0016: Layered Architecture - Refactoring enabled optimization
ADR-0017: Repository Pattern - Optimization in repository method
ADR-0007: Concept Extraction - Concepts have embeddings
ADR-0009: Three-Table Architecture - Concepts table structure

References¶

Confidence Level: HIGH Attribution: - Planning docs: November 14, 2024 - Performance metrics: PR-DESCRIPTION.md lines 19-25 - Problem identified: 01-architecture-review-analysis.md lines 178-194

Traceability: 2025-11-14-architecture-refactoring