Skip to main content

PuertoRicoE Integration

Processes real estate transactions from puertoricoe.com ("Comparables Online"), a transaction database operated by the company behind Clasificados (Puerto Rico's Craigslist).

System Overview

Apify Scraper --> Webhook --> Ingestor --> SQS --> Processor --> Supabase
|
v
Cloud Run (AI Scoring)

Components

AWS Lambdas

LambdaPurpose
ingestorReceives webhook from Apify, sends to SQS
fetcherFetches datasets from Apify API
processorMain pipeline (transform, score, enrich, upsert)
authorizerAPI Gateway authentication

GCP Cloud Run

ServicePurpose
ai-scorerScores buyers/sellers as business entities vs individuals

Data Flow

Apify (scrapes puertoricoe.com)
| Webhook
v
Ingestor Lambda
| SQS (batch: 250)
v
Processor Lambda
| Extract unique buyer/seller names
v
Cloud Run (ai-scorer)
| Vertex AI Gemini 2.5 Flash Lite
| Returns business entity scores (0-1)
v
Processor Lambda
| Enrich with location data
| Upsert with scores
v
Supabase (puertoricoe_transactions)

Processing Performance

Per 250-record batch:

  • AI scoring: 145-220s (parallel Cloud Run calls)
  • Database enrichment: 5-10s
  • Database upsert: 10-20s
  • Total: ~180-250 seconds

For 10,000 records:

  • 40 batches at 10 concurrent Lambdas
  • Total time: 13-15 minutes

Critical path: AI scoring (85% of processing time)

AI Business Entity Scoring

Purpose: Identify professional real estate investors vs individuals/families

Score Ranges:

ScoreEntity Type
0.0Individuals, family estates ("Sucesion de...", "Estate of...")
0.2-0.4Ambiguous (personal trusts)
0.8-1.0Business entities (LLC, Corp, Properties, Holdings)

Examples:

"Sucesion de Juan Perez" --> 0.0
"Estate of Maria Garcia" --> 0.0
"ABC Properties LLC" --> 0.95
"First Bank PR" --> 0.95

Implementation:

  • Cloud Run service wraps Vertex AI (avoids Lambda 250MB limit)
  • Parallel requests to Cloud Run (2-3 concurrent per Lambda)
  • Non-blocking (continues processing if scoring fails)

Database Schema

Table: public.puertoricoe_transactions

ColumnTypeDescription
transaction_idUUIDPrimary key
buyerStringBuyer name
sellerStringSeller name
buyer_business_scoreFloatAI business entity score
seller_business_scoreFloatAI business entity score
sales_priceIntegerTransaction amount
transaction_dateDateTransaction date
muni_idIntegerMunicipality ID
barrio_idIntegerBarrio ID
catastro_idStringCRIM catastro number
geoGeographyPostGIS point

Deployment

Processor Lambda

cd lambdas/processor
./deploy.sh

The deploy script:

  1. Builds dependencies for Linux x86_64 (Docker)
  2. Validates architecture
  3. Packages and uploads to S3
  4. Updates Lambda

Cloud Run Service

cd services/ai-scorer
./deploy.sh

Environment Variables

Processor Lambda:

VariableDescription
BUSINESS_ENTITY_SCORER_URLCloud Run endpoint
SUPABASE_DB_URLDatabase connection
ENABLE_AI_SCORINGtrue/false
GCP_PROJECT_IDprecise-braid-447604-a9
GCP_LOCATIONus-east1
GCP_CREDENTIALS_SECRETalianza/gcp-service-account

Monitoring

CloudWatch Alarms:

  • puertoricoe-processor-errors-dev - Error rate >5 in 5min
  • puertoricoe-processor-duration-warning-dev - Duration >4min

Alerts sent to: noc@alianzacap.com

Check status:

# Queue depth
aws sqs get-queue-attributes \
--queue-url https://sqs.us-east-2.amazonaws.com/915848750366/puertoricoe-records \
--attribute-names All --region us-east-2

# DLQ
aws sqs get-queue-attributes \
--queue-url https://sqs.us-east-2.amazonaws.com/915848750366/puertoricoe-records-dlq \
--attribute-names All --region us-east-2

# Lambda logs
aws logs tail /aws/lambda/puertoricoe-processor --follow --region us-east-2

Troubleshooting

ImportModuleError

Dependencies built for wrong architecture (macOS vs Linux).

Fix: Use deploy.sh which builds via Docker for Linux x86_64.

Lambda Timeouts

AI scoring exceeding 5-minute timeout.

Fix: Batch size reduced to 250, parallel Cloud Run calls implemented.

AI Scoring Failures

Cloud Run or GCP credentials issue.

Fix: Check Cloud Run logs, verify service account credentials.

  • Source: alianza-hq/backend/puertoricoe/
  • Infrastructure: alianza-infra/modules/puertoricoe/