Skip to main content

System Architecture

CoquiTitle is built as a serverless pipeline using AWS Lambda functions, with Supabase PostgreSQL for data storage and Google Vertex AI for LLM processing.

High-Level Architecture

Lambda Functions

LambdaTypeMemoryTimeoutPurpose
coquititle-apiZip512MB30sREST API endpoints
coquititle-ocr-processorZip2048MB300sDocument AI OCR processing
coquititle-extractorContainer1024MB900sMulti-pass data extraction
coquititle-pending-docs-processorZip1024MB300sPending document processing
coquititle-title-state-builderContainer512MB30sDeterministic title derivation
coquititle-evidence-resolverContainer512MB120sEvidence validation + LLM fallback
coquititle-report-generatorContainer1024MB900sMulti-pass report generation

Technology Stack

Cloud Infrastructure

ServiceProviderPurpose
ComputeAWS Lambda (container)Pipeline execution
StorageAWS S3PDFs, JSON reports
DatabaseSupabase PostgreSQLStructured data, RLS
RealtimeSupabase RealtimeProgress events via WebSocket
SecretsAWS Secrets ManagerAPI keys, DB credentials

AI/ML Services

ServiceProviderModelPurpose
OCRGoogle Document AIEnterprise OCRText + token extraction
ExtractionGoogle Vertex AIGemini 2.5 Flash (configurable)Multimodal 2-pass extraction
Report GenGoogle Vertex AIGemini 2.5 Flash (configurable)Multi-pass prose generation
Evidence FallbackGoogle Vertex AIGemini 2.5 FlashLLM matching for garbled OCR

Shared Modules

The pipeline uses several shared Python modules located in lambdas/shared/:

LLM Client (shared/llm_client.py)

Centralizes all Vertex AI interactions:

  • Thread-safe singleton client initialization
  • Automatic Langfuse tracing integration
  • Retry logic with exponential backoff for rate limits
  • Context caching for multimodal content

Prompt Registry (shared/prompt_registry.py)

Manages prompts via Langfuse with local fallbacks:

  • Remote prompt engineering updates without code deploys
  • Mustache template support
  • Automatic fallback to local prompts if Langfuse unavailable

Title State Module (shared/title_state/)

Deterministic ownership derivation (version tsb_v2):

  • build_title_state() - Derives ownership state from extraction data
  • build_chain_of_title() - Builds chronological chain interleaving acquisitions and events
  • derive_current_rights() - Computes final ownership from ganancial/condominio/individual rules
  • rights_derivation.py - Core derivation rules and validation
  • No LLM dependency for reproducibility

Output structure:

{
"version": "tsb_v2",
"chain_of_title": [...], // Chronological list of acquisitions + events
"confidence_summary": {...}, // Status: confident/needs_review/unknown
"review_flags": [...], // Validation issues with severity
"derivation_map": {...} // Audit trail of which rules produced which outputs
}

Status Flow

Real-time Progress

Progress events are emitted to case_events and delivered via Supabase Realtime:

emit_event(conn, case_id, 'extraction', 'Pass 1: property complete (1/3)', 15,
metadata={
'substep': 'pass1_property',
'input_tokens': 12500,
'output_tokens': 1200,
'result_summary': '4 linderos, 1 cabida, 12 evidence'
})

Events are tagged with run_id so the UI can subscribe to the current run.