Solution Design Architettura Cloud
Deliverable D2.1.2: Solution Design Architettura Cloud
3.1 Approccio: Self-Hosted su Infrastruttura Esistente
Contesto Deployment
Il Data Lake MAPS verrà deployato su infrastruttura esistente DEPP/Openpolis:
- Server: op-linkurious (8 CPU, 31GB RAM, 621GB disk)
- Network: Traefik reverse proxy su rete
gw - Dominio:
*.maps.deppsviluppo.org
Rationale Self-Hosted
- Costi prevedibili: No per-query pricing (BigQuery), no per-storage (S3/GCS)
- Controllo completo: Indipendenza da vendor lock-in
- Compliance: Dati rimangono on-premise
- Scala adeguata: ~10⁴ rows × ~10³ cols non richiedono cloud-scale
3.2 Container Architecture
Docker Compose Multi-Service
yaml
version: '3.8'
services:
# Storage primario
postgres:
image: postgis/postgis:17-3.5
container_name: maps-postgres
volumes:
- postgres-data:/var/lib/postgresql/data
- ./init-scripts:/docker-entrypoint-initdb.d
networks:
- maps-internal
environment:
POSTGRES_DB: maps_db
POSTGRES_USER: maps
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
deploy:
resources:
limits:
cpus: '4'
memory: 8G
# Orchestrazione
prefect-server:
image: prefecthq/prefect:3-python3.11
container_name: maps-prefect-server
command: prefect server start
networks:
- maps-internal
- gw # Traefik
environment:
PREFECT_SERVER_API_HOST: 0.0.0.0
PREFECT_API_DATABASE_CONNECTION_URL: postgresql://prefect:${PREFECT_PASSWORD}@maps-postgres:5432/prefect
labels:
- "traefik.enable=true"
- "traefik.http.routers.maps-prefect.rule=Host(`prefect.maps.deppsviluppo.org`)"
# Worker pools (multi-pool architecture)
worker-istat:
build: ./prefect/flows/istat/
container_name: maps-worker-istat
command: prefect worker start --pool istat-pool --type process
volumes:
- ./prefect/flows/istat:/flows:ro
- ./shared-data:/data:rw
networks:
- maps-internal
environment:
PREFECT_API_URL: http://maps-prefect-server:4200/api
deploy:
replicas: 2
resources:
limits:
cpus: '1'
memory: 1G
worker-pdf:
build: ./prefect/flows/pdf-extraction/
container_name: maps-worker-pdf
command: prefect worker start --pool pdf-pool --type process
volumes:
- ./prefect/flows/pdf-extraction:/flows:ro
- ./shared-data:/data:rw
networks:
- maps-internal
environment:
PREFECT_API_URL: http://maps-prefect-server:4200/api
deploy:
resources:
limits:
cpus: '4'
memory: 4G
# Data catalog
openmetadata:
image: openmetadata/server:latest
container_name: maps-openmetadata
networks:
- maps-internal
- gw
environment:
DB_HOST: maps-postgres
DB_PORT: 5432
DB_USER: openmetadata
DB_PASSWORD: ${OPENMETADATA_PASSWORD}
labels:
- "traefik.enable=true"
- "traefik.http.routers.maps-metadata.rule=Host(`metadata.maps.deppsviluppo.org`)"
networks:
maps-internal:
driver: bridge
gw:
external: true
volumes:
postgres-data:
shared-data:3.3 Resource Allocation
Dimensionamento Servizi
| Servizio | CPU | RAM | Storage | Rationale |
|---|---|---|---|---|
| PostgreSQL | 4 core | 8GB | 200GB | Workload principale, PostGIS operations |
| Prefect Server | 1 core | 2GB | 10GB | Lightweight orchestrator |
| Worker ISTAT (×2) | 1 core | 1GB | - | Ingestion CSV/Excel |
| Worker PDF | 4 core | 4GB | - | Docling ML models |
| OpenMetadata | 2 core | 4GB | 50GB | Metadata catalog |
| TOTALE | 8 core | 20GB | 260GB | Fit su op-linkurious (8/31/621) |
Scalabilità Verticale/Orizzontale
Verticale (upgrade risorse singolo servizio):
- PostgreSQL: fino a 8 core / 16GB (se necessario)
- Worker PDF: fino a 6 core / 6GB (per Docling pesante)
Orizzontale (replica servizi):
- Worker ISTAT: scale fino a 4 repliche (
docker-compose up -d --scale worker-istat=4) - Worker PDF: NO scale (ML models memory-intensive)
3.4 Networking e Sicurezza
Traefik Reverse Proxy
yaml
# labels su servizi per routing
traefik.http.routers.{service}.rule=Host(`{service}.maps.deppsviluppo.org`)
traefik.http.routers.{service}.tls=true
traefik.http.routers.{service}.tls.certresolver=letsencryptDNS Configuration (AWS Route53)
bash
# Script dns-setup.sh
aws route53 change-resource-record-sets \
--hosted-zone-id ${HOSTED_ZONE_ID} \
--change-batch '{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "prefect.maps.deppsviluppo.org",
"Type": "A",
"TTL": 300,
"ResourceRecords": [{"Value": "${SERVER_IP}"}]
}
}]
}'Firewall Rules
bash
# Porte esposte su op-linkurious
80/tcp - HTTP (redirect a HTTPS)
443/tcp - HTTPS (Traefik)
5432/tcp - PostgreSQL (solo da rete interna)Secrets Management
bash
# .env file (NON in git)
POSTGRES_PASSWORD=$(openssl rand -base64 32)
PREFECT_PASSWORD=$(openssl rand -base64 32)
OPENMETADATA_PASSWORD=$(openssl rand -base64 32)3.5 Backup e Disaster Recovery
Strategy
PostgreSQL:
bash
# Script backup.sh
#!/bin/bash
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
docker exec maps-postgres pg_dump -U maps maps_db | \
gzip > /backup/maps_db_${TIMESTAMP}.sql.gz
# Retention: 7 daily, 4 weekly, 3 monthlyShared Data (Bronze layer):
bash
# Rsync incrementale
rsync -avz --progress /data/bronze/ backup-server:/backup/bronze/RPO/RTO:
- Recovery Point Objective: 24h (backup giornaliero)
- Recovery Time Objective: 4h (restore manuale)
[WIP] Questo capitolo sarà completato con:
- Diagrammi architetturali dettagliati
- Security hardening checklist
- Monitoring stack (Prometheus, Grafana)
- CI/CD pipeline
Prossimo capitolo: Specifiche Infrastruttura Hosting