Data Model
CoquiTitle data is stored in the ridpr schema in Supabase PostgreSQL.
Core Tables
cases
Main table for title study requests.
| Column | Type | Description |
|---|---|---|
case_id | uuid | Primary key |
org_id | uuid | Organization (multi-tenant) |
created_by | uuid | User who created the case |
finca_id | text | Property finca number |
property_address | text | Property address |
demarcacion_code | text | Registry demarcation code |
status | text | Processing status |
error_message | text | Error details if failed |
document_count | integer | Number of uploaded documents |
total_pages | integer | Total pages across documents |
ocr_mode | text | OCR processing mode |
extractor_model | text | LLM model for extraction |
report_model | text | LLM model for report generation |
current_run_id | uuid | Active pipeline run ID |
pending_docs_raw_json | jsonb | Raw pending presentations data |
pending_docs_normalized_json | jsonb | Processed pending docs |
langfuse_trace_id | text | Observability trace ID |
created_at | timestamptz | Creation timestamp |
updated_at | timestamptz | Last update timestamp |
documents
Uploaded PDF documents for a case.
| Column | Type | Description |
|---|---|---|
doc_id | uuid | Primary key |
case_id | uuid | FK to cases |
filename | text | Original filename |
s3_key | text | S3 storage path |
page_count | integer | Number of pages |
uploaded_at | timestamptz | Upload timestamp |
extractions
Structured data extracted from documents.
| Column | Type | Description |
|---|---|---|
extraction_id | uuid | Primary key |
case_id | uuid | FK to cases |
run_id | uuid | Pipeline run ID |
schema_json | jsonb | Extracted structured data |
pass1_schema_json | jsonb | Pass 1 intermediate results |
is_pass1_only | boolean | Partial extraction flag |
evidence_metadata | jsonb | Alias → canonical ID mappings |
title_state | jsonb | Derived title state (see below) |
derived_current_rights | jsonb | Computed ownership rights (canonical if confident) |
model_version | text | LLM model used |
input_tokens | integer | Total input tokens |
output_tokens | integer | Total output tokens |
thinking_tokens | integer | Thinking/reasoning tokens |
extraction_duration_ms | integer | Processing time |
title_state Structure (tsb_v2)
The title_state column contains the deterministic derivation output from the Title State Builder:
{
"version": "tsb_v2",
"chain_of_title": [
{
"entry_type": "acquisition",
"original_index": 0,
"inscription_number": "5ta",
"date_iso": "1977-12-29",
"data": { /* original acquisition object */ }
},
{
"entry_type": "event",
"original_index": 0,
"inscription_number": "7ma",
"date_iso": null,
"data": { "event_type": "death", "person": "Santiago Delgado" }
}
],
"confidence_summary": {
"status": "confident", // or "needs_review", "unknown"
"reasons": []
},
"review_flags": [
{
"issue": "requires_partition_for_transfer_or_mortgage",
"details": "Patrimonio indiviso: requiere partición...",
"severity": "warn"
}
],
"derivation_map": {
"derived_current_rights": {
"rule": "rights_derivation_v1",
"inputs": ["acquisitions", "events", "titulares", "is_patrimonio_indiviso"]
},
"chain_of_title": {
"rule": "build_chain_of_title_v1",
"inputs": ["acquisitions", "events"]
}
}
}
Chain of Title Entry Types:
| entry_type | Description |
|---|---|
acquisition | Property purchase, inheritance, donation |
event | Non-acquisition title impacts (death, partition, divorce) |
Event Types:
| event_type | Description |
|---|---|
death | Causante dies, triggers inheritance |
partition | Heirs divide estate shares |
declaratoria_de_herederos | Judicial declaration of heirs |
cuota_viudal_usufructuaria | Widow's usufruct rights |
divorce | Community property dissolution |
Confidence Status:
| Status | Meaning |
|---|---|
confident | No review flags; canonical rights persisted |
needs_review | Has warnings but no errors |
unknown | Has errors or empty candidate rights |
evidence_sources
Links extracted fields to source document locations.
| Column | Type | Description |
|---|---|---|
source_id | uuid | Primary key |
extraction_id | uuid | FK to extractions |
case_id | uuid | FK to cases |
run_id | uuid | Pipeline run ID |
field_path | text | JSON path (e.g., titulares[0].name) |
field_value | text | Extracted value |
doc_id | uuid | FK to documents |
page_no | integer | Page number |
line_id | text | OCR line reference |
bboxes | jsonb | Array of bounding boxes |
match_method | text | How evidence was matched |
match_confidence | float | Confidence score (0-1) |
Match Methods:
| Method | Confidence | Description |
|---|---|---|
exact | 1.0 | Quote found as exact substring |
fuzzy_full_line | 0.8 | Normalized text match |
nearby_exact | 1.0 | Found in adjacent line (±2) |
nearby_fuzzy | 0.8 | Fuzzy match in adjacent line |
llm | 0.85 | LLM-based token matching |
approximate | 0.3 | Best-effort fallback |
failed | 0.0 | No match found |
reports
Generated title study reports.
| Column | Type | Description |
|---|---|---|
report_id | uuid | Primary key |
case_id | uuid | FK to cases |
run_id | uuid | Pipeline run ID |
lang | text | Report language |
report_json | jsonb | Full report structure |
s3_uri | text | JSON storage path |
pdf_s3_uri | text | PDF storage path |
model_version | text | LLM model used |
report_duration_ms | integer | Report gen time |
report_input_tokens | integer | Report input tokens |
report_output_tokens | integer | Report output tokens |
OCR Tables
pages
Page-level OCR results.
| Column | Type | Description |
|---|---|---|
page_id | uuid | Primary key |
doc_id | uuid | FK to documents |
page_no | integer | Page number |
width | integer | Page width in pixels |
height | integer | Page height in pixels |
text | text | Full page text |
ocr_done_at | timestamptz | OCR completion time |
ocr_tokens
Word-level tokens with bounding boxes.
| Column | Type | Description |
|---|---|---|
token_id | uuid | Primary key |
doc_id | uuid | FK to documents |
page_no | integer | Page number |
text | text | Token text |
bbox | jsonb | {x, y, width, height} normalized 0-1 |
confidence | float | OCR confidence |
start_index | integer | Text anchor start |
end_index | integer | Text anchor end |
line_id | text | Reference to ocr_lines |
ocr_lines
Line-level text for evidence resolution.
| Column | Type | Description |
|---|---|---|
line_id | text | Primary key (format: {doc_uuid}-P{page}-L{line:03d}) |
doc_id | uuid | FK to documents |
page_no | integer | Page number |
line_no | integer | Line number on page |
text | text | Line text |
start_index | integer | Text anchor start |
end_index | integer | Text anchor end |
bbox | jsonb | Line bounding box |
Pending Documents Tables
pending_presentations
Scraped "documentos presentados" from Karibe.
| Column | Type | Description |
|---|---|---|
presentation_id | uuid | Primary key |
asiento_karibe | text | Karibe asiento number |
finca | text | Property finca number |
demarcacion_code | text | Registry demarcation |
doc_kind | text | Document type |
pdf_url | text | Source PDF URL |
s3_key | text | Archived PDF path |
scraped_at | timestamptz | When scraped |
pending_docs_cache
Cache for pending document processing.
| Column | Type | Description |
|---|---|---|
cache_key | text | Primary key (asiento + hash) |
case_id | uuid | FK to cases |
ocr_text | text | Extracted OCR text |
extraction_json | jsonb | Extracted structured data |
processed_at | timestamptz | Processing timestamp |
Supporting Tables
case_events
Real-time progress events for UI updates.
| Column | Type | Description |
|---|---|---|
event_id | uuid | Primary key |
case_id | uuid | FK to cases |
run_id | uuid | Pipeline run ID |
step | text | Pipeline step name |
message | text | Progress message |
progress | integer | Percentage (0-100) |
metadata | jsonb | Additional event data |
created_at | timestamptz | Event timestamp |
case_shares
Report sharing links.
| Column | Type | Description |
|---|---|---|
share_id | uuid | Primary key |
case_id | uuid | FK to cases |
created_by | uuid | User who created share |
expires_at | timestamptz | Expiration time |
revoked_at | timestamptz | Revocation time |
view_count | integer | Number of views |
last_viewed_at | timestamptz | Last view time |
embargos
Lien and embargo records from Karibe.
See Embargos System for detailed documentation.
Entity Relationship Diagram
Run ID Tracking
Each pipeline execution is tagged with a run_id (UUID) to enable re-runs without polluting history:
| Table | Column | Description |
|---|---|---|
cases | current_run_id | The active run for this case |
extractions | run_id | Which run produced this extraction |
evidence_sources | run_id | Which run produced these citations |
reports | run_id | Which run produced this report |
case_events | run_id | Which run emitted this event |
When re-running a case, a new run_id is generated, and cases.current_run_id is updated. Historical data from previous runs is preserved for auditing.
Related Pages
- System Overview - High-level architecture
- Extraction Pipeline - How extraction data is generated
- Evidence Resolution -
evidence_sourcestable usage - Report Generation -
reportstable usage - Observability - Langfuse tracing