Files

Madava 01ebc23e3f Add dockerized app and infra scaffolding

2025-11-26 06:49:58 +01:00

24 KiB

Raw Blame History

System Architecture

Below is a summary of the Production VPS and Development Laptop architectures. Both environments use Docker containers for consistency, with near-identical stacks where practical.

flowchart LR
%% Client
    A(Browser / PWA)
    Y(iOS App / Android App)
    
    subgraph User
        A
        Y
    end

%% LLM / Realtime
    B(OpenAI Realtime API)
    Z(Gemini Live API)

    subgraph Large Language Model
        B
        Z
    end

%% Server-side
    C(Caddy)
    I(Gitea + Actions + Repositories)
    J(Gitea Runner)

    D(Next.js Frontend)
    E(FastAPI Backend + Agent Runtime)
    G(LiveKit Server)
    H[(PostgreSQL + pgvector)]

%% Client ↔ VPS
    A <-- https://www.avaaz.ai --> C
    A <-- https://app.avaaz.ai --> C
    A & Y <-- https://api.avaaz.ai --> C
    A & Y <-- wss://rtc.avaaz.ai --> C
    A & Y <-- "udp://rtc.avaaz.ai:50000-60000 (WebRTC Media)" --> G

%% Caddy ↔ App
    C <-- "http://frontend:3000 (app)" --> D
    C <-- "http://backend:8000 (api)" --> E
    C <-- "ws://livekit:7880 (WebRTC signaling)" --> G
    C <-- "http://gitea:3000 (git)" --> I

%% App internal
    D <-- "http://backend:8000" --> E
    E <-- "postgresql://postgres:5432" --> H
    E <-- "http://livekit:7880 (control)" --> G
    E <-- "Agent joins via WebRTC" --> G

%% Agent ↔ LLM
    E <-- "WSS/WebRTC (realtime)" --> B
    E <-- "WSS (streaming)" --> Z

%% CI/CD
    I <-- "CI/CD triggers" --> J

    subgraph VPS
        subgraph Infra
            C
            I
            J
        end
        subgraph App
            D
            E
            G
            H
        end        
    end

%% Development Environment
    L(VS Code + Git + Docker)
    M(Local Docker Compose)
    N(Local Browser)
    O(Local Frontend)
    P(Local Backend)
    Q[(Local Postgres)]
    R(Local LiveKit)

    L <-- "https://git.avaaz.ai/...git" --> C
    L <-- "ssh://git@git.avaaz.ai:2222/..." --> I
    L -- "docker compose up" --> M

    M -- "Build & Run" --> O & P & Q & R

    N <-- HTTP --> O & P
    N <-- WebRTC --> R

    O <-- HTTP --> P
    P <-- SQL --> Q
    P <-- HTTP/WebRTC --> R
    P <-- WSS/WebRTC --> B
    P <-- WSS --> Z

    subgraph Development Laptop
        L
        M
        N
        subgraph Local App
            O
            P
            Q
            R
        end
    end

1. Production VPS

1.1 Components

Infra Stack

Docker Compose: ./infra/docker-compose.yml.

Container	Description
`caddy`	Caddy – Reverse proxy with automatic HTTPS (TLS termination via Let’s Encrypt).
`gitea`	Gitea + Actions – Git server using SQLite. Automated CI/CD workflows.
`gitea-runner`	Gitea Runner – Executes CI/CD jobs defined in Gitea Actions workflows.

App Stack

Docker Compose: ./app/docker-compose.yml.

Container	Description
`frontend`	Next.js Frontend – SPA/PWA interface served from a Node.js-based Next.js server.
`backend`	FastAPI + Uvicorn Backend – API, auth, business logic, LiveKit orchestration, agent.
`postgres`	PostgreSQL + pgvector – Persistent relational database with vector search.
`livekit`	LiveKit Server – WebRTC signaling plus UDP media for real-time audio and data.

The backend uses several Python packages such as UV, Ruff, FastAPI, FastAPI Users, FastAPI-pagination, FastStream, FastMCP, Pydantic, PydanticAI, Pydantic-settings, LiveKit Agent, Google Gemini Live API, OpenAI Realtime API, SQLAlchemy, Alembic, docling, Gunicorn, Uvicorn[standard], Pyright, Pytest, Hypothesis, and Httpx to deliver the services.

1.2 Network

All containers join a shared proxy Docker network.
Caddy can route to any service by container name.
App services communicate internally:
- Frontend ↔ Backend
- Backend ↔ Postgres
- Backend ↔ LiveKit
- Backend (agent) ↔ LiveKit & external LLM realtime APIs

1.3 Public DNS Records

Hostname	Record Type	Target	Purpose
www.avaaz.ai	CNAME	avaaz.ai	Marketing / landing site
avaaz.ai	A	217.154.51.242	Root domain
app.avaaz.ai	A	217.154.51.242	Next.js frontend (SPA/PWA)
api.avaaz.ai	A	217.154.51.242	FastAPI backend
rtc.avaaz.ai	A	217.154.51.242	LiveKit signaling + media
git.avaaz.ai	A	217.154.51.242	Gitea (HTTPS + SSH)

1.4 Public Inbound Firewall Ports & Protocols

Port	Protocol	Purpose
80	TCP	HTTP, ACME HTTP-01 challenge
443	TCP	HTTPS, WSS (frontend, backend, LiveKit)
2222	TCP	Git SSH via Gitea
2885	TCP	VPS SSH access
3478	UDP	STUN/TURN
5349	TCP	TURN over TLS
7881	TCP	LiveKit TCP fallback
50000–60000	UDP	LiveKit WebRTC media

1.5 Routing

Caddy

Caddy routes traffic from public ports 80 and 443 to internal services.

https://www.avaaz.ai → http://frontend:3000
https://app.avaaz.ai → http://frontend:3000
https://api.avaaz.ai → http://backend:8000
wss://rtc.avaaz.ai → ws://livekit:7880
https://git.avaaz.ai → http://gitea:3000

Internal Container Network

frontend → http://backend:8000
backend → postgres://postgres:5432
backend → http://livekit:7880 (control)
backend → ws://livekit:7880 (signaling)
backend → udp://livekit:50000-60000 (media)
gitea-runner → /var/run/docker.sock (Docker API on host)

Outgoing

backend → https://api.openai.com/v1/realtime/sessions
backend → wss://api.openai.com/v1/realtime?model=gpt-realtime
backend → wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent

1.6 Functional Layers

Data Layer

Infra:

SQLite (Gitea)
- Gitea stores Git metadata (users, repos, issues, Actions metadata) in /data/gitea/gitea.db.
- This is a file-backed SQLite database inside a persistent Docker volume.
- Repository contents are stored under /data/git/, also volume-backed.
Gitea Runner State
- Gitea Actions runner stores its registration information and job metadata under /data/.runner.

App:

PostgreSQL with pgvector
- Primary relational database for users, lessons, transcripts, embeddings, and conversational context.
- Hosted in the postgres container with a persistent Docker volume.
- Managed via SQLAlchemy and Alembic migrations in the backend.
LiveKit Ephemeral State
- Room metadata, participant states, and signaling information persist in memory within the livekit container.
- LiveKit’s SFU media buffers and room state are not persisted across restarts.

Control Layer

Infra:

Caddy
- TLS termination (Let’s Encrypt).
- Reverse proxy and routing for all public domains.
- ACME certificate renewal.
Gitea
- Git hosting, pull/clone over SSH and HTTPS.
- CI/CD orchestration via Actions and internal APIs.
Gitea Runner
- Executes workflows and controls the Docker engine via /var/run/docker.sock.

App:

FastAPI Backend
- Authentication and authorization (/auth/login, /auth/refresh, /auth/me).
- REST APIs for lessons, progress, documents, and file handling.
- LiveKit session management (room mapping /sessions/default, token minting /sessions/default/token, agent configuration).
- Calls out to OpenAI Realtime and Gemini Live APIs for AI-driven conversational behavior.
LiveKit Server
- Manages room signaling, participant permissions, and session state.
- Exposes HTTP control endpoint for room and participant management.

Media Layer

App:

User Audio Path
- Browser/mobile → LiveKit:
  - WSS signaling via rtc.avaaz.ai → Caddy → livekit:7880.
  - UDP audio and data channels via rtc.avaaz.ai:50000–60000 directly to LiveKit on the VPS.
- WebRTC handles ICE, STUN/TURN, jitter buffers, and Opus audio encoding.
AI Agent Audio Path
- The agent logic inside the backend uses LiveKit Agent SDK to join rooms as a participant.
- Agent → LiveKit:
  - WS signaling over the internal Docker network (ws://livekit:7880).
  - UDP audio transport as part of its WebRTC session.
- Agent → LLM realtime API:
  - Secure WSS/WebRTC connection to OpenAI Realtime or Gemini Live.
- The agent transcribes, processes, and generates audio responses, publishing them into the LiveKit room so the user hears natural speech.

1.7 CI/CD Pipeline

Production CI/CD is handled by Gitea Actions running on the VPS. The gitea-runner container has access to the host Docker daemon and is responsible for both validation and deployment:

.gitea/workflows/ci.yml – Continuous Integration (branch/PR validation, no deployment).
.gitea/workflows/cd.yml – Continuous Deployment (tag-based releases to production).

Build Phase (CI Workflow: `ci.yml`)

Triggers

push to:
- feature/**
- bugfix/**
pull_request targeting main.

Runner & Environment

Runs on the self-hosted runner labeled linux_amd64.
Checks out the relevant branch or PR commit from the avaaz-app repository into the runner’s workspace.

Steps

Checkout code
Uses actions/checkout@v4 to fetch the branch or PR head commit.
Report triggering context
Logs the event type (push or pull_request) and branches:
- For push: the source branch (e.g., feature/foo).
- For pull_request: source and target (main).
Static analysis & tests
- Run linters, type checkers, and unit tests for backend and frontend.
- Ensure the application code compiles/builds.
Build Docker images for CI
- Build images (e.g., frontend:ci and backend:ci) to validate Dockerfiles and build chain.
- These images are tagged for CI only and not used for production.
Cleanup CI images
- Remove CI-tagged images at the end of the job (even on failure) to prevent disk usage from accumulating.

Outcome

A green CI result on a branch/PR signals that:
- The code compiles/builds.
- Static checks and tests pass.
- Docker images can be built successfully.
CI does not modify the production stack and does not depend on tags.

Deploy Phase (CD Workflow: `cd.yml`)

Triggers

Creation of a Git tag matching v* that points to a commit on the main branch in the avaaz-app repository.

Runner & Environment

Runs on the same linux_amd64 self-hosted runner.
Checks out the exact commit referenced by the tag.

Steps

Checkout tagged commit
- Uses actions/checkout@v4 with ref: ${{ gitea.ref }} to check out the tagged commit.
Tag validation
- Fetches origin/main.
- Verifies that the tag commit is an ancestor of origin/main (i.e., the tag points to code that has been merged into main).
- Fails the deployment if the commit is not in main’s history.
Build & publish release
- Builds production Docker images for frontend, backend, LiveKit, etc., tagged with the version (e.g., v0.1.0).
- Applies database migrations (e.g., via Alembic) if required.
Restart production stack
- Restarts or recreates the app stack containers using the newly built/tagged images (e.g., via docker compose -f docker-compose.yml up -d).
Health & readiness checks
- Probes key endpoints with curl -f, such as:
  - https://app.avaaz.ai
  - https://api.avaaz.ai/health
  - wss://rtc.avaaz.ai (signaling-level check)
- If checks fail, marks the deployment as failed and automatically rolls back to previous images.

Outcome

Only tagged releases whose commits are on the main branch are deployed.
Deployment is explicit (tag-based), separated from CI validation.

1.8 Typical Workflows

Browser loads the frontend from https://app.avaaz.ai.
Frontend submits credentials to POST https://api.avaaz.ai/auth/login.
Backend validates credentials and returns:
- A short-lived JWT access token
- A long-lived opaque refresh token
- A minimal user profile for immediate UI hydration
Frontend stores tokens appropriately (access token in memory; refresh token in secure storage or an httpOnly cookie).

Load Persistent Session

Frontend calls GET https://api.avaaz.ai/sessions/default.
Backend retrieves or creates the user’s persistent conversational session, which encapsulates:
- Long-running conversation state
- Lesson and progress context
- Historical summary for LLM context initialization
Backend prepares the session’s LLM context so that the agent can join with continuity.

Join the Live Conversation Session

Frontend requests a LiveKit access token via POST https://api.avaaz.ai/sessions/default/token.
Backend generates a new LiveKit token (short-lived, room-scoped), containing:
- Identity
- Publish/subscribe permissions
- Expiration (affecting initial join)
- Room ID corresponding to the session
Frontend connects to the LiveKit server:
- WSS for signaling
- UDP/SCTP for low-latency audio and file transfer
If the user disconnects, the frontend requests a new LiveKit token before rejoining, ensuring seamless continuity.

Conversation with AI Agent

Backend configures the session’s AI agent using:
- Historical summary
- Current lesson state
- Language settings and mode (lesson, mock exam, free talk)
The agent joins the same LiveKit room as a participant.
All media flows through LiveKit:
- User → audio → LiveKit → Agent
- Agent → LLM realtime API → synthesized audio → LiveKit → User
The agent guides the user verbally: continuing lessons, revisiting material, running mock exams, or free conversation.

The user experiences this as a continuous, ongoing session with seamless reconnection and state persistence.

1.9 Hardware

Class	Description
system	Standard PC (i440FX + PIIX, 1996)
bus	Motherboard
memory	96KiB BIOS
processor	AMD EPYC-Milan Processor
memory	8GiB System Memory
bridge	440FX - 82441FX PMC [Natoma]
bridge	82371SB PIIX3 ISA [Natoma/Triton II]
communication	PnP device PNP0501
input	PnP device PNP0303
input	PnP device PNP0f13
storage	PnP device PNP0700
system	PnP device PNP0b00
storage	82371SB PIIX3 IDE [Natoma/Triton II]
bus	82371SB PIIX3 USB [Natoma/Triton II]
bus	UHCI Host Controller
input	QEMU USB Tablet
bridge	82371AB/EB/MB PIIX4 ACPI
display	QXL paravirtual graphic card
generic	Virtio RNG
storage	Virtio block device
disk	257GB Virtual I/O device
volume	238GiB EXT4 volume
volume	4095KiB BIOS Boot partition
volume	105MiB Windows FAT volume
volume	913MiB EXT4 volume
network	Virtio network device
network	Ethernet interface
input	Power Button
input	AT Translated Set 2 keyboard
input	VirtualPS/2 VMware VMMouse

2. Development Laptop

2.1 Components

App Stack (local Docker)

frontend (Next.js SPA)
backend (FastAPI)
postgres (PostgreSQL + pgvector)
livekit (local LiveKit Server)

No Caddy is deployed locally; the browser talks directly to the mapped container ports on localhost.

2.2 Network

All services run as Docker containers on a shared Docker network.
Selected ports are published to localhost for direct access from the browser and local tools.
No public domains are used in development; everything is addressed via http://localhost/....

2.3 Domains & IP Addresses

Local development uses:

http://localhost:3000 → frontend (Next.js dev/server container)
http://localhost:8000 → backend API (FastAPI)
- Example auth/session endpoints:
  - POST http://localhost:8000/auth/login
  - GET http://localhost:8000/sessions/default
  - POST http://localhost:8000/sessions/default/token
ws://localhost:7880 → LiveKit signaling (local LiveKit server)
udp://localhost:50000–60000 → LiveKit/WebRTC media

No /etc/hosts changes or TLS certificates are required; localhost acts as a secure origin for WebRTC.

2.4 Ports & Protocols

Port	Protocol	Purpose
3000	TCP	Frontend (Next.js)
8000	TCP	Backend API (FastAPI)
5432	TCP	Postgres + pgvector
7880	TCP	LiveKit HTTP + WS signaling
50000–60000	UDP	LiveKit WebRTC media (audio, data)

2.5 Routing

No local Caddy or reverse proxy layer is used; routing is direct via published ports.

Internal Container Routing (Docker network)

Backend → Postgres: postgres://postgres:5432
Backend → LiveKit: http://livekit:7880
Frontend (server-side) → Backend: http://backend:8000

Browser → Containers (via localhost)

Browser → Frontend: http://localhost:3000
Browser → Backend API: http://localhost:8000

Outgoing (from Backend)

backend → https://api.openai.com/v1/realtime/sessions
backend → wss://api.openai.com/v1/realtime?model=gpt-realtime
backend → wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent

These calls mirror production agent behavior while pointing to the same cloud LLM realtime endpoints.

2.6 Functional Layers

Data Layer

Local Postgres instance mirrors the production schema (including pgvector).
Database migrations are applied via backend tooling (e.g., Alembic) to keep schema in sync.

Control Layer

Backend runs full application logic locally:
- Authentication and authorization
- Lesson and progress APIs
- LiveKit session management (/sessions/default, /sessions/default/token) and agent control
Frontend integrates against the same API surface as production, only with localhost URLs.

Media Layer

Local LiveKit instance handles:
- WS/HTTP signaling on port 7880
- WebRTC media (audio + data channels) on UDP 50000–60000
Agent traffic mirrors production logic:
- LiveKit ↔ Backend ↔ LLM realtime APIs (OpenAI / Gemini).

2.7 Typical Workflows

Developer Pushes Code

Developer pushes to git.avaaz.ai over HTTPS/SSL or SSH.
CI runs automatically (linting, tests, build validation). No deployment occurs.
When a release is ready, the developer creates a version tag (v*) on a commit in main.
CD triggers: validates the tag, rebuilds from the tagged commit, deploys updated containers, then performs post-deploy health checks.

App Development

Start the stack: docker compose -f docker-compose.dev.yml up -d
Open the app in the browser: http://localhost:3000
Frontend calls the local backend for:
- POST http://localhost:8000/auth/login
- GET http://localhost:8000//sessions/default
- POST http://localhost:8000//sessions/default/token

API Testing

Health check: curl http://localhost:8000/health

Auth and session testing:

curl -X POST http://localhost:8000/auth/login \
     -H "Content-Type: application/json" \
     -d '{"email": "user@example.com", "password": "password"}'

curl http://localhost:8000/sessions/default \
     -H "Authorization: Bearer <access_token>"

LiveKit Testing

Frontend connects to LiveKit via:
- Signaling: ws://localhost:7880
- WebRTC media: udp://localhost:50000–60000
Backend issues local LiveKit tokens via POST http://localhost:8000//sessions/default/token, then connects the AI agent to the local room.

2.8 Hardware

Class	Description
system	HP Laptop 14-em0xxx
bus	8B27 motherboard bus
memory	128KiB BIOS
processor	AMD Ryzen 3 7320U
memory	256KiB L1 cache
memory	2MiB L2 cache
memory	4MiB L3 cache
memory	8GiB System Memory
bridge	Family 17h-19h PCIe Root Complex
generic	Family 17h-19h IOMMU
storage	SK hynix BC901 HFS256GE SSD
disk	256GB NVMe disk
volume	299MiB Windows FAT volume
volume	238GiB EXT4 volume
network	RTL8852BE PCIe 802.11ax Wi-Fi
display	Mendocino integrated graphics
multimedia	Rembrandt Radeon High Definition Audio
generic	Family 19h PSP/CCP
bus	AMD xHCI Host Controller
input	Logitech M705 Mouse
input	Logitech K370s/K375s Keyboard
multimedia	Jabra SPEAK 510 USB
multimedia	Logitech Webcam C925e
communication	Bluetooth Radio
multimedia	HP True Vision HD Camera
bus	FCH SMBus Controller
bridge	FCH LPC Bridge
power	AE03041 Battery
input	Power Button
input	Lid Switch
input	HP WMI Hotkeys
input	AT Translated Set 2 Keyboard
input	Video Bus
input	SYNA32D9:00 06CB:CE17 Mouse
input	SYNA32D9:00 06CB:CE17 Touchpad
network	Ethernet Interface

24 KiB Raw Blame History Unescape Escape

System Architecture

1. Production VPS

1.1 Components

Infra Stack

App Stack

1.2 Network

1.3 Public DNS Records

1.4 Public Inbound Firewall Ports & Protocols

1.5 Routing

Caddy

Internal Container Network

Outgoing

1.6 Functional Layers

Data Layer

Control Layer

Media Layer

1.7 CI/CD Pipeline

Build Phase (CI Workflow: ci.yml)

Deploy Phase (CD Workflow: cd.yml)

1.8 Typical Workflows

User Login

Load Persistent Session

Join the Live Conversation Session

Conversation with AI Agent

1.9 Hardware

2. Development Laptop

2.1 Components

App Stack (local Docker)

2.2 Network

2.3 Domains & IP Addresses

2.4 Ports & Protocols

2.5 Routing

Internal Container Routing (Docker network)

Browser → Containers (via localhost)

Outgoing (from Backend)

2.6 Functional Layers

Data Layer

Control Layer

Media Layer

2.7 Typical Workflows

Developer Pushes Code

App Development

API Testing

LiveKit Testing

2.8 Hardware

24 KiB

Raw Blame History

Build Phase (CI Workflow: `ci.yml`)

Deploy Phase (CD Workflow: `cd.yml`)