PuertoRicoE Integration
Processes real estate transactions from puertoricoe.com ("Comparables Online"), a transaction database operated by the company behind Clasificados (Puerto Rico's Craigslist).
System Overview
Apify Scraper --> Webhook --> Ingestor --> SQS --> Processor --> Supabase
|
v
Cloud Run (AI Scoring)
Components
AWS Lambdas
| Lambda | Purpose |
|---|---|
ingestor | Receives webhook from Apify, sends to SQS |
fetcher | Fetches datasets from Apify API |
processor | Main pipeline (transform, score, enrich, upsert) |
authorizer | API Gateway authentication |
GCP Cloud Run
| Service | Purpose |
|---|---|
ai-scorer | Scores buyers/sellers as business entities vs individuals |
Data Flow
Apify (scrapes puertoricoe.com)
| Webhook
v
Ingestor Lambda
| SQS (batch: 250)
v
Processor Lambda
| Extract unique buyer/seller names
v
Cloud Run (ai-scorer)
| Vertex AI Gemini 2.5 Flash Lite
| Returns business entity scores (0-1)
v
Processor Lambda
| Enrich with location data
| Upsert with scores
v
Supabase (puertoricoe_transactions)
Processing Performance
Per 250-record batch:
- AI scoring: 145-220s (parallel Cloud Run calls)
- Database enrichment: 5-10s
- Database upsert: 10-20s
- Total: ~180-250 seconds
For 10,000 records:
- 40 batches at 10 concurrent Lambdas
- Total time: 13-15 minutes
Critical path: AI scoring (85% of processing time)
AI Business Entity Scoring
Purpose: Identify professional real estate investors vs individuals/families
Score Ranges:
| Score | Entity Type |
|---|---|
| 0.0 | Individuals, family estates ("Sucesion de...", "Estate of...") |
| 0.2-0.4 | Ambiguous (personal trusts) |
| 0.8-1.0 | Business entities (LLC, Corp, Properties, Holdings) |
Examples:
"Sucesion de Juan Perez" --> 0.0
"Estate of Maria Garcia" --> 0.0
"ABC Properties LLC" --> 0.95
"First Bank PR" --> 0.95
Implementation:
- Cloud Run service wraps Vertex AI (avoids Lambda 250MB limit)
- Parallel requests to Cloud Run (2-3 concurrent per Lambda)
- Non-blocking (continues processing if scoring fails)
Database Schema
Table: public.puertoricoe_transactions
| Column | Type | Description |
|---|---|---|
transaction_id | UUID | Primary key |
buyer | String | Buyer name |
seller | String | Seller name |
buyer_business_score | Float | AI business entity score |
seller_business_score | Float | AI business entity score |
sales_price | Integer | Transaction amount |
transaction_date | Date | Transaction date |
muni_id | Integer | Municipality ID |
barrio_id | Integer | Barrio ID |
catastro_id | String | CRIM catastro number |
geo | Geography | PostGIS point |
Deployment
Processor Lambda
cd lambdas/processor
./deploy.sh
The deploy script:
- Builds dependencies for Linux x86_64 (Docker)
- Validates architecture
- Packages and uploads to S3
- Updates Lambda
Cloud Run Service
cd services/ai-scorer
./deploy.sh
Environment Variables
Processor Lambda:
| Variable | Description |
|---|---|
BUSINESS_ENTITY_SCORER_URL | Cloud Run endpoint |
SUPABASE_DB_URL | Database connection |
ENABLE_AI_SCORING | true/false |
GCP_PROJECT_ID | precise-braid-447604-a9 |
GCP_LOCATION | us-east1 |
GCP_CREDENTIALS_SECRET | alianza/gcp-service-account |
Monitoring
CloudWatch Alarms:
puertoricoe-processor-errors-dev- Error rate >5 in 5minpuertoricoe-processor-duration-warning-dev- Duration >4min
Alerts sent to: noc@alianzacap.com
Check status:
# Queue depth
aws sqs get-queue-attributes \
--queue-url https://sqs.us-east-2.amazonaws.com/915848750366/puertoricoe-records \
--attribute-names All --region us-east-2
# DLQ
aws sqs get-queue-attributes \
--queue-url https://sqs.us-east-2.amazonaws.com/915848750366/puertoricoe-records-dlq \
--attribute-names All --region us-east-2
# Lambda logs
aws logs tail /aws/lambda/puertoricoe-processor --follow --region us-east-2
Troubleshooting
ImportModuleError
Dependencies built for wrong architecture (macOS vs Linux).
Fix: Use deploy.sh which builds via Docker for Linux x86_64.
Lambda Timeouts
AI scoring exceeding 5-minute timeout.
Fix: Batch size reduced to 250, parallel Cloud Run calls implemented.
AI Scoring Failures
Cloud Run or GCP credentials issue.
Fix: Check Cloud Run logs, verify service account credentials.
Related Documentation
- Source:
alianza-hq/backend/puertoricoe/ - Infrastructure:
alianza-infra/modules/puertoricoe/