playground/docs/architecture.md

# System Architecture

Below is a summary of the **Production VPS** and **Development Laptop** architectures. Both environments use Docker containers for consistency, with near-identical stacks where practical.

```mermaid
flowchart LR
%% Client
    A(Browser / PWA)
    Y(iOS App / Android App)

    subgraph User
        A
        Y
    end

%% LLM / Realtime
    B(OpenAI Realtime API)
    Z(Gemini Live API)

    subgraph Large Language Model
        B
        Z
    end

%% Server-side
    C(Caddy)
    I(Gitea + Actions + Repositories)
    J(Gitea Runner)

    D(Next.js Frontend)
    E(FastAPI Backend + Agent Runtime)
    G(LiveKit Server)
    H[(PostgreSQL + pgvector)]

%% Client ↔ VPS
    A <-- https://www.avaaz.ai --> C
    A <-- https://app.avaaz.ai --> C
    A & Y <-- https://api.avaaz.ai --> C
    A & Y <-- wss://rtc.avaaz.ai --> C
    A & Y <-- "udp://rtc.avaaz.ai:50000-60000 (WebRTC Media)" --> G

%% Caddy ↔ App
    C <-- "http://frontend:3000 (app)" --> D
    C <-- "http://backend:8000 (api)" --> E
    C <-- "ws://livekit:7880 (WebRTC signaling)" --> G
    C <-- "http://gitea:3000 (git)" --> I

%% App internal
    D <-- "http://backend:8000" --> E
    E <-- "postgresql://postgres:5432" --> H
    E <-- "http://livekit:7880 (control)" --> G
    E <-- "Agent joins via WebRTC" --> G

%% Agent ↔ LLM
    E <-- "WSS/WebRTC (realtime)" --> B
    E <-- "WSS (streaming)" --> Z

%% CI/CD
    I <-- "CI/CD triggers" --> J

    subgraph VPS
        subgraph Infra
            C
            I
            J
        end
        subgraph App
            D
            E
            G
            H
        end
    end

%% Development Environment
    L(VS Code + Git + Docker)
    M(Local Docker Compose)
    N(Local Browser)
    O(Local Frontend)
    P(Local Backend)
    Q[(Local Postgres)]
    R(Local LiveKit)

    L <-- "https://git.avaaz.ai/...git" --> C
    L <-- "ssh://git@git.avaaz.ai:2222/..." --> I
    L -- "docker compose up" --> M

    M -- "Build & Run" --> O & P & Q & R

    N <-- HTTP --> O & P
    N <-- WebRTC --> R

    O <-- HTTP --> P
    P <-- SQL --> Q
    P <-- HTTP/WebRTC --> R
    P <-- WSS/WebRTC --> B
    P <-- WSS --> Z

    subgraph Development Laptop
        L
        M
        N
        subgraph Local App
            O
            P
            Q
            R
        end
    end
```

## 1. Production VPS

### 1.1 Components

#### Infra Stack

Docker Compose: `./infra/docker-compose.yml`.

| Container      | Description                                                                         |
| -------------- | ----------------------------------------------------------------------------------- |
| `caddy`        | **Caddy** – Reverse proxy with automatic HTTPS (TLS termination via Let’s Encrypt). |
| `gitea`        | **Gitea + Actions** – Git server using SQLite. Automated CI/CD workflows.           |
| `gitea-runner` | **Gitea Runner** – Executes CI/CD jobs defined in Gitea Actions workflows.          |

#### App Stack

Docker Compose: `./app/docker-compose.yml`.

| Container  | Description                                                                               |
| ---------- | ----------------------------------------------------------------------------------------- |
| `frontend` | **Next.js Frontend** – SPA/PWA interface served from a Node.js-based Next.js server.      |
| `backend`  | **FastAPI + Uvicorn Backend** – API, auth, business logic, LiveKit orchestration, agent.  |
| `postgres` | **PostgreSQL + pgvector** – Persistent relational database with vector search.            |
| `livekit`  | **LiveKit Server** – WebRTC signaling plus UDP media for real-time audio and data.       |

The `backend` uses several Python packages such as UV, Ruff, FastAPI, FastAPI Users, FastAPI-pagination, FastStream, FastMCP, Pydantic, PydanticAI, Pydantic-settings, LiveKit Agent, Google Gemini Live API, OpenAI Realtime API, SQLAlchemy, Alembic, docling, Gunicorn, Uvicorn[standard], Pyright, Pytest, Hypothesis, and Httpx to deliver the services.

### 1.2 Network

- All containers join a shared `proxy` Docker network.
- Caddy can route to any service by container name.
- App services communicate internally:
  - Frontend ↔ Backend
  - Backend ↔ Postgres
  - Backend ↔ LiveKit
  - Backend (agent) ↔ LiveKit & external LLM realtime APIs

### 1.3 Public DNS Records

| Hostname             | Record Type | Target         | Purpose                          |
| -------------------- | :---------: | -------------- | -------------------------------- |
| **www\.avaaz\.ai**     | CNAME       | avaaz.ai       | Marketing / landing site         |
| **avaaz.ai**         | A           | 217.154.51.242 | Root domain                      |
| **app.avaaz.ai**     | A           | 217.154.51.242 | Next.js frontend (SPA/PWA)       |
| **api.avaaz.ai**     | A           | 217.154.51.242 | FastAPI backend                  |
| **rtc.avaaz.ai**     | A           | 217.154.51.242 | LiveKit signaling + media        |
| **git.avaaz.ai**     | A           | 217.154.51.242 | Gitea (HTTPS + SSH)              |

### 1.4 Public Inbound Firewall Ports & Protocols

| Port           | Protocol | Purpose                                 |
| -------------: | :------: | --------------------------------------- |
| **80**         | TCP      | HTTP, ACME HTTP-01 challenge            |
| **443**        | TCP      | HTTPS, WSS (frontend, backend, LiveKit) |
| **2222**       | TCP      | Git SSH via Gitea                       |
| **2885**       | TCP      | VPS SSH access                          |
| **3478**       | UDP      | STUN/TURN                               |
| **5349**       | TCP      | TURN over TLS                           |
| **7881**       | TCP      | LiveKit TCP fallback                    |
| **50000–60000**| UDP      | LiveKit WebRTC media                    |

### 1.5 Routing

#### Caddy

Caddy routes traffic from public ports 80 and 443 to internal services.

- `https://www.avaaz.ai` → `http://frontend:3000`
- `https://app.avaaz.ai` → `http://frontend:3000`
- `https://api.avaaz.ai` → `http://backend:8000`
- `wss://rtc.avaaz.ai` → `ws://livekit:7880`
- `https://git.avaaz.ai` → `http://gitea:3000`

#### Internal Container Network

- `frontend` → `http://backend:8000`
- `backend` → `postgres://postgres:5432`
- `backend` → `http://livekit:7880` (control)
- `backend` → `ws://livekit:7880` (signaling)
- `backend` → `udp://livekit:50000-60000` (media)
- `gitea-runner` → `/var/run/docker.sock` (Docker API on host)

#### Outgoing

- `backend` → `https://api.openai.com/v1/realtime/sessions`
- `backend` → `wss://api.openai.com/v1/realtime?model=gpt-realtime`
- `backend` → `wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`

### 1.6 Functional Layers

#### Data Layer

**Infra:**

- **SQLite (Gitea)**
  - Gitea stores Git metadata (users, repos, issues, Actions metadata) in `/data/gitea/gitea.db`.
  - This is a file-backed SQLite database inside a persistent Docker volume.
  - Repository contents are stored under `/data/git/`, also volume-backed.

- **Gitea Runner State**
  - Gitea Actions runner stores its registration information and job metadata under `/data/.runner`.

**App:**

- **PostgreSQL with pgvector**
  - Primary relational database for users, lessons, transcripts, embeddings, and conversational context.
  - Hosted in the `postgres` container with a persistent Docker volume.
  - Managed via SQLAlchemy and Alembic migrations in the backend.

- **LiveKit Ephemeral State**
  - Room metadata, participant states, and signaling information persist in memory within the `livekit` container.
  - LiveKit’s SFU media buffers and room state are **not** persisted across restarts.

#### Control Layer

**Infra:**

- **Caddy**
  - TLS termination (Let’s Encrypt).
  - Reverse proxy and routing for all public domains.
  - ACME certificate renewal.

- **Gitea**
  - Git hosting, pull/clone over SSH and HTTPS.
  - CI/CD orchestration via Actions and internal APIs.

- **Gitea Runner**
  - Executes workflows and controls the Docker engine via `/var/run/docker.sock`.

**App:**

- **FastAPI Backend**
  - Authentication and authorization (`/auth/login`, `/auth/refresh`, `/auth/me`).
  - REST APIs for lessons, progress, documents, and file handling.
  - LiveKit session management (room mapping `/sessions/default`, token minting `/sessions/default/token`, agent configuration).
  - Calls out to OpenAI Realtime and Gemini Live APIs for AI-driven conversational behavior.

- **LiveKit Server**
  - Manages room signaling, participant permissions, and session state.
  - Exposes HTTP control endpoint for room and participant management.

#### Media Layer

**App:**

- **User Audio Path**
  - Browser/mobile → LiveKit:
    - WSS signaling via `rtc.avaaz.ai` → Caddy → `livekit:7880`.
    - UDP audio and data channels via `rtc.avaaz.ai:50000–60000` directly to LiveKit on the VPS.
  - WebRTC handles ICE, STUN/TURN, jitter buffers, and Opus audio encoding.

- **AI Agent Audio Path**
  - The agent logic inside the backend uses LiveKit Agent SDK to join rooms as a participant.
  - Agent → LiveKit:
    - WS signaling over the internal Docker network (`ws://livekit:7880`).
    - UDP audio transport as part of its WebRTC session.
  - Agent → LLM realtime API:
    - Secure WSS/WebRTC connection to OpenAI Realtime or Gemini Live.
  - The agent transcribes, processes, and generates audio responses, publishing them into the LiveKit room so the user hears natural speech.

### 1.7 CI/CD Pipeline

Production CI/CD is handled by **Gitea Actions** running on the VPS. The `gitea-runner` container has access to the host Docker daemon and is responsible for both validation and deployment:

- `.gitea/workflows/ci.yml` – **Continuous Integration** (branch/PR validation, no deployment).
- `.gitea/workflows/cd.yml` – **Continuous Deployment** (tag-based releases to production).

#### Build Phase (CI Workflow: `ci.yml`)

**Triggers**

- `push` to:
  - `feature/**`
  - `bugfix/**`
- `pull_request` targeting `main`.

**Runner & Environment**

- Runs on the self-hosted runner labeled `linux_amd64`.
- Checks out the relevant branch or PR commit from the `avaaz-app` repository into the runner’s workspace.

**Steps**

1. **Checkout code**
   Uses `actions/checkout@v4` to fetch the branch or PR head commit.

2. **Report triggering context**
   Logs the event type (`push` or `pull_request`) and branches:
   - For `push`: the source branch (e.g., `feature/foo`).
   - For `pull_request`: source and target (`main`).

3. **Static analysis & tests**
   - Run linters, type checkers, and unit tests for backend and frontend.
   - Ensure the application code compiles/builds.

4. **Build Docker images for CI**
   - Build images (e.g., `frontend:ci` and `backend:ci`) to validate Dockerfiles and build chain.
   - These images are tagged for CI only and not used for production.

5. **Cleanup CI images**
   - Remove CI-tagged images at the end of the job (even on failure) to prevent disk usage from accumulating.

**Outcome**

- A green CI result on a branch/PR signals that:
  - The code compiles/builds.
  - Static checks and tests pass.
  - Docker images can be built successfully.
- CI does **not** modify the production stack and does **not** depend on tags.

#### Deploy Phase (CD Workflow: `cd.yml`)

**Triggers**

- Creation of a Git tag matching `v*` that points to a commit on the `main` branch in the `avaaz-app` repository.

**Runner & Environment**

- Runs on the same `linux_amd64` self-hosted runner.
- Checks out the exact commit referenced by the tag.

**Steps**

1. **Checkout tagged commit**
   - Uses `actions/checkout@v4` with `ref: ${{ gitea.ref }}` to check out the tagged commit.

2. **Tag validation**
   - Fetches `origin/main`.
   - Verifies that the tag commit is an ancestor of `origin/main` (i.e., the tag points to code that has been merged into `main`).
   - Fails the deployment if the commit is not in `main`’s history.

3. **Build & publish release**
   - Builds production Docker images for frontend, backend, LiveKit, etc., tagged with the version (e.g., `v0.1.0`).
   - Applies database migrations (e.g., via Alembic) if required.

4. **Restart production stack**
   - Restarts or recreates the app stack containers using the newly built/tagged images (e.g., via `docker compose -f docker-compose.yml up -d`).

5. **Health & readiness checks**
   - Probes key endpoints with `curl -f`, such as:
     - `https://app.avaaz.ai`
     - `https://api.avaaz.ai/health`
     - `wss://rtc.avaaz.ai` (signaling-level check)
   - If checks fail, marks the deployment as failed and automatically rolls back to previous images.

**Outcome**

- Only tagged releases whose commits are on the `main` branch are deployed.
- Deployment is explicit (tag-based), separated from CI validation.

### 1.8 Typical Workflows

#### User Login

1. Browser loads the frontend from `https://app.avaaz.ai`.
2. Frontend submits credentials to `POST https://api.avaaz.ai/auth/login`.
3. Backend validates credentials and returns:
   - A short-lived JWT **access token**
   - A long-lived opaque **refresh token**
   - A minimal user profile for immediate UI hydration
4. Frontend stores tokens appropriately (access token in memory; refresh token in secure storage or an httpOnly cookie).

#### Load Persistent Session

1. Frontend calls `GET https://api.avaaz.ai/sessions/default`.
2. Backend retrieves or creates the user’s **persistent conversational session**, which encapsulates:
   - Long-running conversation state
   - Lesson and progress context
   - Historical summary for LLM context initialization
3. Backend prepares the session’s LLM context so that the agent can join with continuity.

#### Join the Live Conversation Session

1. Frontend requests a LiveKit access token via `POST https://api.avaaz.ai/sessions/default/token`.
2. Backend generates a **new LiveKit token** (short-lived, room-scoped), containing:
   - Identity
   - Publish/subscribe permissions
   - Expiration (affecting initial join)
   - Room ID corresponding to the session
3. Frontend connects to the LiveKit server:
   - WSS for signaling
   - UDP/SCTP for low-latency audio and file transfer
4. If the user disconnects, the frontend requests a new LiveKit token before rejoining, ensuring seamless continuity.

#### Conversation with AI Agent

1. Backend configures the session’s **AI agent** using:
   - Historical summary
   - Current lesson state
   - Language settings and mode (lesson, mock exam, free talk)
2. The agent joins the same LiveKit room as a participant.
3. All media flows through LiveKit:
   - User → audio → LiveKit → Agent
   - Agent → LLM realtime API → synthesized audio → LiveKit → User
4. The agent guides the user verbally: continuing lessons, revisiting material, running mock exams, or free conversation.

The user experiences this as a **continuous, ongoing session** with seamless reconnection and state persistence.

### 1.9 Hardware

| Class          | Description                               |
|----------------|-------------------------------------------|
| system         | Standard PC (i440FX + PIIX, 1996)         |
| bus            | Motherboard                               |
| memory         | 96KiB BIOS                                |
| processor      | AMD EPYC-Milan Processor                  |
| memory         | 8GiB System Memory                        |
| bridge         | 440FX - 82441FX PMC [Natoma]              |
| bridge         | 82371SB PIIX3 ISA [Natoma/Triton II]      |
| communication  | PnP device PNP0501                        |
| input          | PnP device PNP0303                        |
| input          | PnP device PNP0f13                        |
| storage        | PnP device PNP0700                        |
| system         | PnP device PNP0b00                        |
| storage        | 82371SB PIIX3 IDE [Natoma/Triton II]      |
| bus            | 82371SB PIIX3 USB [Natoma/Triton II]      |
| bus            | UHCI Host Controller                      |
| input          | QEMU USB Tablet                           |
| bridge         | 82371AB/EB/MB PIIX4 ACPI                  |
| display        | QXL paravirtual graphic card              |
| generic        | Virtio RNG                                |
| storage        | Virtio block device                       |
| disk           | 257GB Virtual I/O device                  |
| volume         | 238GiB EXT4 volume                        |
| volume         | 4095KiB BIOS Boot partition               |
| volume         | 105MiB Windows FAT volume                 |
| volume         | 913MiB EXT4 volume                        |
| network        | Virtio network device                     |
| network        | Ethernet interface                        |
| input          | Power Button                              |
| input          | AT Translated Set 2 keyboard              |
| input          | VirtualPS/2 VMware VMMouse                |

## 2. Development Laptop

### 2.1 Components

#### App Stack (local Docker)

- `frontend` (Next.js SPA)
- `backend` (FastAPI)
- `postgres` (PostgreSQL + pgvector)
- `livekit` (local LiveKit Server)

No Caddy is deployed locally; the browser talks directly to the mapped container ports on `localhost`.

### 2.2 Network

- All services run as Docker containers on a shared Docker network.
- Selected ports are published to `localhost` for direct access from the browser and local tools.
- No public domains are used in development; everything is addressed via `http://localhost/...`.

### 2.3 Domains & IP Addresses

Local development uses:

- `http://localhost:3000` → frontend (Next.js dev/server container)
- `http://localhost:8000` → backend API (FastAPI)
  - Example auth/session endpoints:
    - `POST http://localhost:8000/auth/login`
    - `GET  http://localhost:8000/sessions/default`
    - `POST http://localhost:8000/sessions/default/token`
- `ws://localhost:7880` → LiveKit signaling (local LiveKit server)
- `udp://localhost:50000–60000` → LiveKit/WebRTC media

No `/etc/hosts` changes or TLS certificates are required; `localhost` acts as a secure origin for WebRTC.

### 2.4 Ports & Protocols

| Port         | Protocol | Purpose                            |
|-------------:|:--------:|------------------------------------|
| 3000         | TCP      | Frontend (Next.js)                 |
| 8000         | TCP      | Backend API (FastAPI)              |
| 5432         | TCP      | Postgres + pgvector                |
| 7880         | TCP      | LiveKit HTTP + WS signaling        |
| 50000–60000  | UDP      | LiveKit WebRTC media (audio, data) |

### 2.5 Routing

No local Caddy or reverse proxy layer is used; routing is direct via published ports.

#### Internal Container Routing (Docker network)

- Backend → Postgres: `postgres://postgres:5432`
- Backend → LiveKit: `http://livekit:7880`
- Frontend (server-side) → Backend: `http://backend:8000`

#### Browser → Containers (via localhost)

- Browser → Frontend: `http://localhost:3000`
- Browser → Backend API: `http://localhost:8000`

#### Outgoing (from Backend)

- `backend` → `https://api.openai.com/v1/realtime/sessions`
- `backend` → `wss://api.openai.com/v1/realtime?model=gpt-realtime`
- `backend` → `wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`

These calls mirror production agent behavior while pointing to the same cloud LLM realtime endpoints.

### 2.6 Functional Layers

#### Data Layer

- Local Postgres instance mirrors the production schema (including pgvector).
- Database migrations are applied via backend tooling (e.g., Alembic) to keep schema in sync.

#### Control Layer

- Backend runs full application logic locally:
  - Authentication and authorization
  - Lesson and progress APIs
  - LiveKit session management (`/sessions/default`, `/sessions/default/token`) and agent control
- Frontend integrates against the same API surface as production, only with `localhost` URLs.

#### Media Layer

- Local LiveKit instance handles:
  - WS/HTTP signaling on port 7880
  - WebRTC media (audio + data channels) on UDP `50000–60000`
- Agent traffic mirrors production logic:
  - LiveKit ↔ Backend ↔ LLM realtime APIs (OpenAI / Gemini).

### 2.7 Typical Workflows

#### Developer Pushes Code

1. Developer pushes to `git.avaaz.ai` over HTTPS/SSL or SSH.
2. CI runs automatically (linting, tests, build validation). No deployment occurs.
3. When a release is ready, the developer creates a version tag (`v*`) on a commit in `main`.
4. CD triggers: validates the tag, rebuilds from the tagged commit, deploys updated containers, then performs post-deploy health checks.

#### App Development

- Start the stack: `docker compose -f docker-compose.dev.yml up -d`
- Open the app in the browser: `http://localhost:3000`
- Frontend calls the local backend for:
  - `POST http://localhost:8000/auth/login`
  - `GET  http://localhost:8000//sessions/default`
  - `POST http://localhost:8000//sessions/default/token`

#### API Testing

- Health check: `curl http://localhost:8000/health`
- Auth and session testing:

  ```bash
  curl -X POST http://localhost:8000/auth/login \
       -H "Content-Type: application/json" \
       -d '{"email": "user@example.com", "password": "password"}'

  curl http://localhost:8000/sessions/default \
       -H "Authorization: Bearer <access_token>"
  ```

#### LiveKit Testing

- Frontend connects to LiveKit via:
  - Signaling: `ws://localhost:7880`
  - WebRTC media: `udp://localhost:50000–60000`
- Backend issues local LiveKit tokens via `POST http://localhost:8000//sessions/default/token`, then connects the AI agent to the local room.

### 2.8 Hardware

| Class          | Description                                |
|----------------|--------------------------------------------|
| system         | HP Laptop 14-em0xxx                        |
| bus            | 8B27 motherboard bus                       |
| memory         | 128KiB BIOS                                |
| processor      | AMD Ryzen 3 7320U                          |
| memory         | 256KiB L1 cache                            |
| memory         | 2MiB L2 cache                              |
| memory         | 4MiB L3 cache                              |
| memory         | 8GiB System Memory                         |
| bridge         | Family 17h-19h PCIe Root Complex           |
| generic        | Family 17h-19h IOMMU                       |
| storage        | SK hynix BC901 HFS256GE SSD                |
| disk           | 256GB NVMe disk                            |
| volume         | 299MiB Windows FAT volume                  |
| volume         | 238GiB EXT4 volume                         |
| network        | RTL8852BE PCIe 802.11ax Wi-Fi              |
| display        | Mendocino integrated graphics              |
| multimedia     | Rembrandt Radeon High Definition Audio     |
| generic        | Family 19h PSP/CCP                         |
| bus            | AMD xHCI Host Controller                   |
| input          | Logitech M705 Mouse                        |
| input          | Logitech K370s/K375s Keyboard              |
| multimedia     | Jabra SPEAK 510 USB                        |
| multimedia     | Logitech Webcam C925e                      |
| communication  | Bluetooth Radio                            |
| multimedia     | HP True Vision HD Camera                   |
| bus            | FCH SMBus Controller                       |
| bridge         | FCH LPC Bridge                             |
| power          | AE03041 Battery                            |
| input          | Power Button                               |
| input          | Lid Switch                                 |
| input          | HP WMI Hotkeys                             |
| input          | AT Translated Set 2 Keyboard               |
| input          | Video Bus                                  |
| input          | SYNA32D9:00 06CB:CE17 Mouse                |
| input          | SYNA32D9:00 06CB:CE17 Touchpad             |
| network        | Ethernet Interface                         |