Evidence Resolution
The Evidence Resolver validates evidence citations from extraction and resolves them to pixel-level bounding boxes for UI visualization.
Lambda: coquititle-evidence-resolver
Technology: Deterministic matching + visual grounding + parallel LLM fallback
Three-Phase Processing
Evidence resolution uses a tiered approach to maximize accuracy while minimizing LLM costs:
Phase 1: Deterministic Matching (Serial)
Fast, cost-free matching attempted first:
| Method | Confidence | Description |
|---|---|---|
exact | 1.0 | Quote found as exact substring in claimed line |
fuzzy_full_line | 0.8 | Normalized text match (lowercase, collapsed whitespace) |
nearby_exact | 1.0 | Exact match found in adjacent line (±2 lines) |
nearby_fuzzy | 0.8 | Fuzzy match in adjacent line |
# Normalization for fuzzy matching
def normalize_text(text):
text = text.lower()
text = re.sub(r'\s+', ' ', text)
return text.strip()
Phase 1.5: Visual Grounding (Parallel)
When deterministic matching fails, visual grounding uses Gemini to visually locate the extracted value on the PDF page image. This is particularly effective when OCR text doesn't match due to formatting or rendering differences, but the value is clearly visible on the page.
| Method | Confidence | Description |
|---|---|---|
visual_grounding | 0.8 | Gemini visually locates value on PDF page |
Parallel Processing:
- Up to 12 concurrent visual grounding calls (
MAX_VG_WORKERS = 12) - Circuit breaker limits to 100 attempts per invocation to control API costs
- Pre-fetched PDF bytes and Gemini caches minimize database access in threads
Per-Document Caching:
To avoid re-uploading the same PDF for multiple evidence references on the same document, visual grounding uses per-document Gemini context caching:
- PDF pages are uploaded once and cached for 15 minutes
- Subsequent evidence items on the same document reuse the cached context
- LRU eviction with max 10 cached documents per invocation
Phase 2: LLM Fallback (Parallel)
For evidence that fails both deterministic matching and visual grounding (common with garbled OCR from handwritten documents where the value isn't visually identifiable):
| Method | Confidence | Description |
|---|---|---|
llm | 0.85 | LLM-based token matching |
approximate | 0.3 | Last-resort: highlight claimed line ± 1 |
failed | 0.0 | No match found |
Parallel Processing:
- Up to 12 concurrent LLM calls (
MAX_LLM_WORKERS = 12) - Each call resolves one evidence citation
- Pre-fetched tokens and page context for thread safety
LLM Fallback Architecture
The LLM fallback handles garbled OCR where substring matching fails:
LLM Prompt Structure
The prompt explains that OCR of handwritten documents produces garbled text:
"Muñoz Bermúdez" might appear as "Locanto mu Berdas"
"inscripción primera" might appear as "inscripclon primcra"
The LLM identifies which tokens contain the value, despite distortion.
Page-Scoped Context
To reduce token usage (~90% reduction), the LLM receives only:
- Target page ± 1 adjacent pages
- Tokens from claimed line ± 5 lines
def fetch_document_catalog_for_page(conn, doc_id, target_page):
"""Fetch line-indexed catalog for target page ± 1."""
# Only 3 pages max instead of entire document
return pages_dict
Bounding Box Resolution
Once tokens are identified, bounding boxes are extracted:
Coordinate Systems
| Source | Format | Range |
|---|---|---|
| Gemini (visual) | [ymin, xmin, ymax, xmax] | 0-1000 |
| OCR Tokens | {x, y, width, height} | 0-1 normalized |
| Database | {x, y, width, height} | 0-1 normalized |
Conversion
def convert_gemini_bbox_to_normalized(bbox):
"""Convert Gemini's 0-1000 to normalized 0-1."""
ymin, xmin, ymax, xmax = bbox
return {
'x': xmin / 1000.0,
'y': ymin / 1000.0,
'width': (xmax - xmin) / 1000.0,
'height': (ymax - ymin) / 1000.0
}
Validation
Bboxes are validated before storage:
- Coordinates must be in valid range
- Inverted coordinates are auto-corrected
- Zero-area bboxes are rejected
Visual Evidence Handling
For multimodal extraction where the LLM reads directly from PDF images:
| Priority | Method | Confidence | Description |
|---|---|---|---|
| 1 | visual_direct_bbox | 0.9 | Direct bbox from Gemini |
| 2 | visual_ocr_match | 0.7 | Found visual quote in OCR text |
| 3 | visual_page_fallback | 0.4 | Highlight entire claimed page |
| 4 | visual_no_ocr | 0.2 | No OCR lines found |
Output Schema
Each resolved evidence source is stored in evidence_sources:
{
"source_id": "uuid",
"extraction_id": "uuid",
"field_path": "titulares[0].name",
"field_value": "Juan Pérez García",
"doc_id": "uuid",
"page_no": 2,
"line_id": "D1-P2-L045",
"bboxes": [
{"x": 0.15, "y": 0.32, "width": 0.25, "height": 0.02}
],
"match_method": "exact",
"match_confidence": 1.0
}
Performance Metrics
Typical resolution statistics:
| Metric | Value |
|---|---|
| Deterministic success rate | ~75% |
| LLM fallback success rate | ~85% of remaining |
| Overall resolution rate | ~96% |
| Avg LLM fallback latency | 2-3s per call |
| Parallel batch time | ~15s for 50 evidence items |
Langfuse Integration
Each LLM fallback call is traced with:
evidence.llm_fallbackgeneration span- Field path and claimed line metadata
- Token usage metrics
- Match result scoring
Related Pages
- Extraction Pipeline - How evidence citations are generated
- Report Generation - How resolved evidence is used
- Data Model -
evidence_sourcestable schema - Observability - Langfuse tracing for evidence resolution