Evidence Resolution

The Evidence Resolver validates evidence citations from extraction and resolves them to pixel-level bounding boxes for UI visualization.

Lambda: coquititle-evidence-resolver Technology: Deterministic matching + visual grounding + parallel LLM fallback

Three-Phase Processing

Evidence resolution uses a tiered approach to maximize accuracy while minimizing LLM costs:

Phase 1: Deterministic Matching (Serial)

Fast, cost-free matching attempted first:

Method	Confidence	Description
`exact`	1.0	Quote found as exact substring in claimed line
`fuzzy_full_line`	0.8	Normalized text match (lowercase, collapsed whitespace)
`nearby_exact`	1.0	Exact match found in adjacent line (±2 lines)
`nearby_fuzzy`	0.8	Fuzzy match in adjacent line

# Normalization for fuzzy matching
def normalize_text(text):
    text = text.lower()
    text = re.sub(r'\s+', ' ', text)
    return text.strip()

Phase 1.5: Visual Grounding (Parallel)

When deterministic matching fails, visual grounding uses Gemini to visually locate the extracted value on the PDF page image. This is particularly effective when OCR text doesn't match due to formatting or rendering differences, but the value is clearly visible on the page.

Method	Confidence	Description
`visual_grounding`	0.8	Gemini visually locates value on PDF page

Parallel Processing:

Up to 12 concurrent visual grounding calls (MAX_VG_WORKERS = 12)
Circuit breaker limits to 100 attempts per invocation to control API costs
Pre-fetched PDF bytes and Gemini caches minimize database access in threads

Per-Document Caching:

To avoid re-uploading the same PDF for multiple evidence references on the same document, visual grounding uses per-document Gemini context caching:

PDF pages are uploaded once and cached for 15 minutes
Subsequent evidence items on the same document reuse the cached context
LRU eviction with max 10 cached documents per invocation

Phase 2: LLM Fallback (Parallel)

For evidence that fails both deterministic matching and visual grounding (common with garbled OCR from handwritten documents where the value isn't visually identifiable):

Method	Confidence	Description
`llm`	0.85	LLM-based token matching
`approximate`	0.3	Last-resort: highlight claimed line ± 1
`failed`	0.0	No match found

Parallel Processing:

Up to 12 concurrent LLM calls (MAX_LLM_WORKERS = 12)
Each call resolves one evidence citation
Pre-fetched tokens and page context for thread safety

LLM Fallback Architecture

The LLM fallback handles garbled OCR where substring matching fails:

LLM Prompt Structure

The prompt explains that OCR of handwritten documents produces garbled text:

"Muñoz Bermúdez" might appear as "Locanto mu Berdas"
"inscripción primera" might appear as "inscripclon primcra"

The LLM identifies which tokens contain the value, despite distortion.

Page-Scoped Context

To reduce token usage (~90% reduction), the LLM receives only:

Target page ± 1 adjacent pages
Tokens from claimed line ± 5 lines

def fetch_document_catalog_for_page(conn, doc_id, target_page):
    """Fetch line-indexed catalog for target page ± 1."""
    # Only 3 pages max instead of entire document
    return pages_dict

Bounding Box Resolution

Once tokens are identified, bounding boxes are extracted:

Coordinate Systems

Source	Format	Range
Gemini (visual)	`[ymin, xmin, ymax, xmax]`	0-1000
OCR Tokens	`{x, y, width, height}`	0-1 normalized
Database	`{x, y, width, height}`	0-1 normalized

Conversion

def convert_gemini_bbox_to_normalized(bbox):
    """Convert Gemini's 0-1000 to normalized 0-1."""
    ymin, xmin, ymax, xmax = bbox
    return {
        'x': xmin / 1000.0,
        'y': ymin / 1000.0,
        'width': (xmax - xmin) / 1000.0,
        'height': (ymax - ymin) / 1000.0
    }

Validation

Bboxes are validated before storage:

Coordinates must be in valid range
Inverted coordinates are auto-corrected
Zero-area bboxes are rejected

Visual Evidence Handling

For multimodal extraction where the LLM reads directly from PDF images:

Priority	Method	Confidence	Description
1	`visual_direct_bbox`	0.9	Direct bbox from Gemini
2	`visual_ocr_match`	0.7	Found visual quote in OCR text
3	`visual_page_fallback`	0.4	Highlight entire claimed page
4	`visual_no_ocr`	0.2	No OCR lines found

Output Schema

Each resolved evidence source is stored in evidence_sources:

{
  "source_id": "uuid",
  "extraction_id": "uuid",
  "field_path": "titulares[0].name",
  "field_value": "Juan Pérez García",
  "doc_id": "uuid",
  "page_no": 2,
  "line_id": "D1-P2-L045",
  "bboxes": [
    {"x": 0.15, "y": 0.32, "width": 0.25, "height": 0.02}
  ],
  "match_method": "exact",
  "match_confidence": 1.0
}

Performance Metrics

Typical resolution statistics:

Metric	Value
Deterministic success rate	~75%
LLM fallback success rate	~85% of remaining
Overall resolution rate	~96%
Avg LLM fallback latency	2-3s per call
Parallel batch time	~15s for 50 evidence items

Langfuse Integration

Each LLM fallback call is traced with:

evidence.llm_fallback generation span
Field path and claimed line metadata
Token usage metrics
Match result scoring

Extraction Pipeline - How evidence citations are generated
Report Generation - How resolved evidence is used
Data Model - evidence_sources table schema
Observability - Langfuse tracing for evidence resolution

Three-Phase Processing​

Phase 1: Deterministic Matching (Serial)​

Phase 1.5: Visual Grounding (Parallel)​

Phase 2: LLM Fallback (Parallel)​

LLM Fallback Architecture​

LLM Prompt Structure​

Page-Scoped Context​

Bounding Box Resolution​

Coordinate Systems​

Conversion​

Validation​

Visual Evidence Handling​

Output Schema​

Performance Metrics​

Langfuse Integration​

Related Pages​