Data Model

CoquiTitle data is stored in the ridpr schema in Supabase PostgreSQL.

Core Tables

cases

Main table for title study requests.

Column	Type	Description
`case_id`	uuid	Primary key
`org_id`	uuid	Organization (multi-tenant)
`created_by`	uuid	User who created the case
`finca_id`	text	Property finca number
`property_address`	text	Property address
`demarcacion_code`	text	Registry demarcation code
`status`	text	Processing status
`error_message`	text	Error details if failed
`document_count`	integer	Number of uploaded documents
`total_pages`	integer	Total pages across documents
`ocr_mode`	text	OCR processing mode
`extractor_model`	text	LLM model for extraction
`report_model`	text	LLM model for report generation
`current_run_id`	uuid	Active pipeline run ID
`pending_docs_raw_json`	jsonb	Raw pending presentations data
`pending_docs_normalized_json`	jsonb	Processed pending docs
`langfuse_trace_id`	text	Observability trace ID
`created_at`	timestamptz	Creation timestamp
`updated_at`	timestamptz	Last update timestamp

documents

Uploaded PDF documents for a case.

Column	Type	Description
`doc_id`	uuid	Primary key
`case_id`	uuid	FK to cases
`filename`	text	Original filename
`s3_key`	text	S3 storage path
`page_count`	integer	Number of pages
`uploaded_at`	timestamptz	Upload timestamp

extractions

Structured data extracted from documents.

Column	Type	Description
`extraction_id`	uuid	Primary key
`case_id`	uuid	FK to cases
`run_id`	uuid	Pipeline run ID
`schema_json`	jsonb	Extracted structured data
`pass1_schema_json`	jsonb	Pass 1 intermediate results
`is_pass1_only`	boolean	Partial extraction flag
`evidence_metadata`	jsonb	Alias → canonical ID mappings
`title_state`	jsonb	Derived title state (see below)
`derived_current_rights`	jsonb	Computed ownership rights (canonical if confident)
`model_version`	text	LLM model used
`input_tokens`	integer	Total input tokens
`output_tokens`	integer	Total output tokens
`thinking_tokens`	integer	Thinking/reasoning tokens
`extraction_duration_ms`	integer	Processing time

title_state Structure (tsb_v2)

The title_state column contains the deterministic derivation output from the Title State Builder:

{
  "version": "tsb_v2",
  "chain_of_title": [
    {
      "entry_type": "acquisition",
      "original_index": 0,
      "inscription_number": "5ta",
      "date_iso": "1977-12-29",
      "data": { /* original acquisition object */ }
    },
    {
      "entry_type": "event",
      "original_index": 0,
      "inscription_number": "7ma",
      "date_iso": null,
      "data": { "event_type": "death", "person": "Santiago Delgado" }
    }
  ],
  "confidence_summary": {
    "status": "confident",  // or "needs_review", "unknown"
    "reasons": []
  },
  "review_flags": [
    {
      "issue": "requires_partition_for_transfer_or_mortgage",
      "details": "Patrimonio indiviso: requiere partición...",
      "severity": "warn"
    }
  ],
  "derivation_map": {
    "derived_current_rights": {
      "rule": "rights_derivation_v1",
      "inputs": ["acquisitions", "events", "titulares", "is_patrimonio_indiviso"]
    },
    "chain_of_title": {
      "rule": "build_chain_of_title_v1",
      "inputs": ["acquisitions", "events"]
    }
  }
}

Chain of Title Entry Types:

entry_type	Description
`acquisition`	Property purchase, inheritance, donation
`event`	Non-acquisition title impacts (death, partition, divorce)

Event Types:

event_type	Description
`death`	Causante dies, triggers inheritance
`partition`	Heirs divide estate shares
`declaratoria_de_herederos`	Judicial declaration of heirs
`cuota_viudal_usufructuaria`	Widow's usufruct rights
`divorce`	Community property dissolution

Confidence Status:

Status	Meaning
`confident`	No review flags; canonical rights persisted
`needs_review`	Has warnings but no errors
`unknown`	Has errors or empty candidate rights

evidence_sources

Links extracted fields to source document locations.

Column	Type	Description
`source_id`	uuid	Primary key
`extraction_id`	uuid	FK to extractions
`case_id`	uuid	FK to cases
`run_id`	uuid	Pipeline run ID
`field_path`	text	JSON path (e.g., `titulares[0].name`)
`field_value`	text	Extracted value
`doc_id`	uuid	FK to documents
`page_no`	integer	Page number
`line_id`	text	OCR line reference
`bboxes`	jsonb	Array of bounding boxes
`match_method`	text	How evidence was matched
`match_confidence`	float	Confidence score (0-1)

Match Methods:

Method	Confidence	Description
`exact`	1.0	Quote found as exact substring
`fuzzy_full_line`	0.8	Normalized text match
`nearby_exact`	1.0	Found in adjacent line (±2)
`nearby_fuzzy`	0.8	Fuzzy match in adjacent line
`llm`	0.85	LLM-based token matching
`approximate`	0.3	Best-effort fallback
`failed`	0.0	No match found

reports

Generated title study reports.

Column	Type	Description
`report_id`	uuid	Primary key
`case_id`	uuid	FK to cases
`run_id`	uuid	Pipeline run ID
`lang`	text	Report language
`report_json`	jsonb	Full report structure
`s3_uri`	text	JSON storage path
`pdf_s3_uri`	text	PDF storage path
`model_version`	text	LLM model used
`report_duration_ms`	integer	Report gen time
`report_input_tokens`	integer	Report input tokens
`report_output_tokens`	integer	Report output tokens

OCR Tables

pages

Page-level OCR results.

Column	Type	Description
`page_id`	uuid	Primary key
`doc_id`	uuid	FK to documents
`page_no`	integer	Page number
`width`	integer	Page width in pixels
`height`	integer	Page height in pixels
`text`	text	Full page text
`ocr_done_at`	timestamptz	OCR completion time

ocr_tokens

Word-level tokens with bounding boxes.

Column	Type	Description
`token_id`	uuid	Primary key
`doc_id`	uuid	FK to documents
`page_no`	integer	Page number
`text`	text	Token text
`bbox`	jsonb	`{x, y, width, height}` normalized 0-1
`confidence`	float	OCR confidence
`start_index`	integer	Text anchor start
`end_index`	integer	Text anchor end
`line_id`	text	Reference to ocr_lines

ocr_lines

Line-level text for evidence resolution.

Column	Type	Description
`line_id`	text	Primary key (format: `{doc_uuid}-P{page}-L{line:03d}`)
`doc_id`	uuid	FK to documents
`page_no`	integer	Page number
`line_no`	integer	Line number on page
`text`	text	Line text
`start_index`	integer	Text anchor start
`end_index`	integer	Text anchor end
`bbox`	jsonb	Line bounding box

Pending Documents Tables

pending_presentations

Scraped "documentos presentados" from Karibe.

Column	Type	Description
`presentation_id`	uuid	Primary key
`asiento_karibe`	text	Karibe asiento number
`finca`	text	Property finca number
`demarcacion_code`	text	Registry demarcation
`doc_kind`	text	Document type
`pdf_url`	text	Source PDF URL
`s3_key`	text	Archived PDF path
`scraped_at`	timestamptz	When scraped

pending_docs_cache

Cache for pending document processing.

Column	Type	Description
`cache_key`	text	Primary key (asiento + hash)
`case_id`	uuid	FK to cases
`ocr_text`	text	Extracted OCR text
`extraction_json`	jsonb	Extracted structured data
`processed_at`	timestamptz	Processing timestamp

Supporting Tables

case_events

Real-time progress events for UI updates.

Column	Type	Description
`event_id`	uuid	Primary key
`case_id`	uuid	FK to cases
`run_id`	uuid	Pipeline run ID
`step`	text	Pipeline step name
`message`	text	Progress message
`progress`	integer	Percentage (0-100)
`metadata`	jsonb	Additional event data
`created_at`	timestamptz	Event timestamp

case_shares

Report sharing links.

Column	Type	Description
`share_id`	uuid	Primary key
`case_id`	uuid	FK to cases
`created_by`	uuid	User who created share
`expires_at`	timestamptz	Expiration time
`revoked_at`	timestamptz	Revocation time
`view_count`	integer	Number of views
`last_viewed_at`	timestamptz	Last view time

embargos

Lien and embargo records from Karibe.

note

See Embargos System for detailed documentation.

Entity Relationship Diagram

Run ID Tracking

Each pipeline execution is tagged with a run_id (UUID) to enable re-runs without polluting history:

Table	Column	Description
`cases`	`current_run_id`	The active run for this case
`extractions`	`run_id`	Which run produced this extraction
`evidence_sources`	`run_id`	Which run produced these citations
`reports`	`run_id`	Which run produced this report
`case_events`	`run_id`	Which run emitted this event

When re-running a case, a new run_id is generated, and cases.current_run_id is updated. Historical data from previous runs is preserved for auditing.

System Overview - High-level architecture
Extraction Pipeline - How extraction data is generated
Evidence Resolution - evidence_sources table usage
Report Generation - reports table usage
Observability - Langfuse tracing

Core Tables​

cases​

documents​

extractions​

title_state Structure (tsb_v2)​

evidence_sources​

reports​

OCR Tables​

pages​

ocr_tokens​

ocr_lines​

Pending Documents Tables​

pending_presentations​

pending_docs_cache​

Supporting Tables​

case_events​

case_shares​

embargos​

Entity Relationship Diagram​

Run ID Tracking​

Related Pages​