Skip to main content

Evidence Resolution

The Evidence Resolver validates evidence citations from extraction and resolves them to pixel-level bounding boxes for UI visualization.

Lambda: coquititle-evidence-resolver Technology: Deterministic matching + visual grounding + parallel LLM fallback

Three-Phase Processing

Evidence resolution uses a tiered approach to maximize accuracy while minimizing LLM costs:

Phase 1: Deterministic Matching (Serial)

Fast, cost-free matching attempted first:

MethodConfidenceDescription
exact1.0Quote found as exact substring in claimed line
fuzzy_full_line0.8Normalized text match (lowercase, collapsed whitespace)
nearby_exact1.0Exact match found in adjacent line (±2 lines)
nearby_fuzzy0.8Fuzzy match in adjacent line
# Normalization for fuzzy matching
def normalize_text(text):
text = text.lower()
text = re.sub(r'\s+', ' ', text)
return text.strip()

Phase 1.5: Visual Grounding (Parallel)

When deterministic matching fails, visual grounding uses Gemini to visually locate the extracted value on the PDF page image. This is particularly effective when OCR text doesn't match due to formatting or rendering differences, but the value is clearly visible on the page.

MethodConfidenceDescription
visual_grounding0.8Gemini visually locates value on PDF page

Parallel Processing:

  • Up to 12 concurrent visual grounding calls (MAX_VG_WORKERS = 12)
  • Circuit breaker limits to 100 attempts per invocation to control API costs
  • Pre-fetched PDF bytes and Gemini caches minimize database access in threads

Per-Document Caching:

To avoid re-uploading the same PDF for multiple evidence references on the same document, visual grounding uses per-document Gemini context caching:

  • PDF pages are uploaded once and cached for 15 minutes
  • Subsequent evidence items on the same document reuse the cached context
  • LRU eviction with max 10 cached documents per invocation

Phase 2: LLM Fallback (Parallel)

For evidence that fails both deterministic matching and visual grounding (common with garbled OCR from handwritten documents where the value isn't visually identifiable):

MethodConfidenceDescription
llm0.85LLM-based token matching
approximate0.3Last-resort: highlight claimed line ± 1
failed0.0No match found

Parallel Processing:

  • Up to 12 concurrent LLM calls (MAX_LLM_WORKERS = 12)
  • Each call resolves one evidence citation
  • Pre-fetched tokens and page context for thread safety

LLM Fallback Architecture

The LLM fallback handles garbled OCR where substring matching fails:

LLM Prompt Structure

The prompt explains that OCR of handwritten documents produces garbled text:

"Muñoz Bermúdez" might appear as "Locanto mu Berdas"
"inscripción primera" might appear as "inscripclon primcra"

The LLM identifies which tokens contain the value, despite distortion.

Page-Scoped Context

To reduce token usage (~90% reduction), the LLM receives only:

  • Target page ± 1 adjacent pages
  • Tokens from claimed line ± 5 lines
def fetch_document_catalog_for_page(conn, doc_id, target_page):
"""Fetch line-indexed catalog for target page ± 1."""
# Only 3 pages max instead of entire document
return pages_dict

Bounding Box Resolution

Once tokens are identified, bounding boxes are extracted:

Coordinate Systems

SourceFormatRange
Gemini (visual)[ymin, xmin, ymax, xmax]0-1000
OCR Tokens{x, y, width, height}0-1 normalized
Database{x, y, width, height}0-1 normalized

Conversion

def convert_gemini_bbox_to_normalized(bbox):
"""Convert Gemini's 0-1000 to normalized 0-1."""
ymin, xmin, ymax, xmax = bbox
return {
'x': xmin / 1000.0,
'y': ymin / 1000.0,
'width': (xmax - xmin) / 1000.0,
'height': (ymax - ymin) / 1000.0
}

Validation

Bboxes are validated before storage:

  • Coordinates must be in valid range
  • Inverted coordinates are auto-corrected
  • Zero-area bboxes are rejected

Visual Evidence Handling

For multimodal extraction where the LLM reads directly from PDF images:

PriorityMethodConfidenceDescription
1visual_direct_bbox0.9Direct bbox from Gemini
2visual_ocr_match0.7Found visual quote in OCR text
3visual_page_fallback0.4Highlight entire claimed page
4visual_no_ocr0.2No OCR lines found

Output Schema

Each resolved evidence source is stored in evidence_sources:

{
"source_id": "uuid",
"extraction_id": "uuid",
"field_path": "titulares[0].name",
"field_value": "Juan Pérez García",
"doc_id": "uuid",
"page_no": 2,
"line_id": "D1-P2-L045",
"bboxes": [
{"x": 0.15, "y": 0.32, "width": 0.25, "height": 0.02}
],
"match_method": "exact",
"match_confidence": 1.0
}

Performance Metrics

Typical resolution statistics:

MetricValue
Deterministic success rate~75%
LLM fallback success rate~85% of remaining
Overall resolution rate~96%
Avg LLM fallback latency2-3s per call
Parallel batch time~15s for 50 evidence items

Langfuse Integration

Each LLM fallback call is traced with:

  • evidence.llm_fallback generation span
  • Field path and claimed line metadata
  • Token usage metrics
  • Match result scoring