Files
playground/docs/architecture.md

614 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# System Architecture
Below is a summary of the **Production VPS** and **Development Laptop** architectures. Both environments use Docker containers for consistency, with near-identical stacks where practical.
```mermaid
flowchart LR
%% Client
A(Browser / PWA)
Y(iOS App / Android App)
subgraph User
A
Y
end
%% LLM / Realtime
B(OpenAI Realtime API)
Z(Gemini Live API)
subgraph Large Language Model
B
Z
end
%% Server-side
C(Caddy)
I(Gitea + Actions + Repositories)
J(Gitea Runner)
D(Next.js Frontend)
E(FastAPI Backend + Agent Runtime)
G(LiveKit Server)
H[(PostgreSQL + pgvector)]
%% Client ↔ VPS
A <-- https://www.avaaz.ai --> C
A <-- https://app.avaaz.ai --> C
A & Y <-- https://api.avaaz.ai --> C
A & Y <-- wss://rtc.avaaz.ai --> C
A & Y <-- "udp://rtc.avaaz.ai:50000-60000 (WebRTC Media)" --> G
%% Caddy ↔ App
C <-- "http://frontend:3000 (app)" --> D
C <-- "http://backend:8000 (api)" --> E
C <-- "ws://livekit:7880 (WebRTC signaling)" --> G
C <-- "http://gitea:3000 (git)" --> I
%% App internal
D <-- "http://backend:8000" --> E
E <-- "postgresql://postgres:5432" --> H
E <-- "http://livekit:7880 (control)" --> G
E <-- "Agent joins via WebRTC" --> G
%% Agent ↔ LLM
E <-- "WSS/WebRTC (realtime)" --> B
E <-- "WSS (streaming)" --> Z
%% CI/CD
I <-- "CI/CD triggers" --> J
subgraph VPS
subgraph Infra
C
I
J
end
subgraph App
D
E
G
H
end
end
%% Development Environment
L(VS Code + Git + Docker)
M(Local Docker Compose)
N(Local Browser)
O(Local Frontend)
P(Local Backend)
Q[(Local Postgres)]
R(Local LiveKit)
L <-- "https://git.avaaz.ai/...git" --> C
L <-- "ssh://git@git.avaaz.ai:2222/..." --> I
L -- "docker compose up" --> M
M -- "Build & Run" --> O & P & Q & R
N <-- HTTP --> O & P
N <-- WebRTC --> R
O <-- HTTP --> P
P <-- SQL --> Q
P <-- HTTP/WebRTC --> R
P <-- WSS/WebRTC --> B
P <-- WSS --> Z
subgraph Development Laptop
L
M
N
subgraph Local App
O
P
Q
R
end
end
```
## 1. Production VPS
### 1.1 Components
#### Infra Stack
Docker Compose: `./infra/docker-compose.yml`.
| Container | Description |
| -------------- | ----------------------------------------------------------------------------------- |
| `caddy` | **Caddy** Reverse proxy with automatic HTTPS (TLS termination via Lets Encrypt). |
| `gitea` | **Gitea + Actions** Git server using SQLite. Automated CI/CD workflows. |
| `gitea-runner` | **Gitea Runner** Executes CI/CD jobs defined in Gitea Actions workflows. |
#### App Stack
Docker Compose: `./app/docker-compose.yml`.
| Container | Description |
| ---------- | ----------------------------------------------------------------------------------------- |
| `frontend` | **Next.js Frontend** SPA/PWA interface served from a Node.js-based Next.js server. |
| `backend` | **FastAPI + Uvicorn Backend** API, auth, business logic, LiveKit orchestration, agent. |
| `postgres` | **PostgreSQL + pgvector** Persistent relational database with vector search. |
| `livekit` | **LiveKit Server** WebRTC signaling plus UDP media for real-time audio and data. |
The `backend` uses several Python packages such as UV, Ruff, FastAPI, FastAPI Users, FastAPI-pagination, FastStream, FastMCP, Pydantic, PydanticAI, Pydantic-settings, LiveKit Agent, Google Gemini Live API, OpenAI Realtime API, SQLAlchemy, Alembic, docling, Gunicorn, Uvicorn[standard], Pyright, Pytest, Hypothesis, and Httpx to deliver the services.
### 1.2 Network
- All containers join a shared `proxy` Docker network.
- Caddy can route to any service by container name.
- App services communicate internally:
- Frontend ↔ Backend
- Backend ↔ Postgres
- Backend ↔ LiveKit
- Backend (agent) ↔ LiveKit & external LLM realtime APIs
### 1.3 Public DNS Records
| Hostname | Record Type | Target | Purpose |
| -------------------- | :---------: | -------------- | -------------------------------- |
| **www\.avaaz\.ai** | CNAME | avaaz.ai | Marketing / landing site |
| **avaaz.ai** | A | 217.154.51.242 | Root domain |
| **app.avaaz.ai** | A | 217.154.51.242 | Next.js frontend (SPA/PWA) |
| **api.avaaz.ai** | A | 217.154.51.242 | FastAPI backend |
| **rtc.avaaz.ai** | A | 217.154.51.242 | LiveKit signaling + media |
| **git.avaaz.ai** | A | 217.154.51.242 | Gitea (HTTPS + SSH) |
### 1.4 Public Inbound Firewall Ports & Protocols
| Port | Protocol | Purpose |
| -------------: | :------: | --------------------------------------- |
| **80** | TCP | HTTP, ACME HTTP-01 challenge |
| **443** | TCP | HTTPS, WSS (frontend, backend, LiveKit) |
| **2222** | TCP | Git SSH via Gitea |
| **2885** | TCP | VPS SSH access |
| **3478** | UDP | STUN/TURN |
| **5349** | TCP | TURN over TLS |
| **7881** | TCP | LiveKit TCP fallback |
| **5000060000**| UDP | LiveKit WebRTC media |
### 1.5 Routing
#### Caddy
Caddy routes traffic from public ports 80 and 443 to internal services.
- `https://www.avaaz.ai``http://frontend:3000`
- `https://app.avaaz.ai``http://frontend:3000`
- `https://api.avaaz.ai``http://backend:8000`
- `wss://rtc.avaaz.ai``ws://livekit:7880`
- `https://git.avaaz.ai``http://gitea:3000`
#### Internal Container Network
- `frontend``http://backend:8000`
- `backend``postgres://postgres:5432`
- `backend``http://livekit:7880` (control)
- `backend``ws://livekit:7880` (signaling)
- `backend``udp://livekit:50000-60000` (media)
- `gitea-runner``/var/run/docker.sock` (Docker API on host)
#### Outgoing
- `backend``https://api.openai.com/v1/realtime/sessions`
- `backend``wss://api.openai.com/v1/realtime?model=gpt-realtime`
- `backend``wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`
### 1.6 Functional Layers
#### Data Layer
**Infra:**
- **SQLite (Gitea)**
- Gitea stores Git metadata (users, repos, issues, Actions metadata) in `/data/gitea/gitea.db`.
- This is a file-backed SQLite database inside a persistent Docker volume.
- Repository contents are stored under `/data/git/`, also volume-backed.
- **Gitea Runner State**
- Gitea Actions runner stores its registration information and job metadata under `/data/.runner`.
**App:**
- **PostgreSQL with pgvector**
- Primary relational database for users, lessons, transcripts, embeddings, and conversational context.
- Hosted in the `postgres` container with a persistent Docker volume.
- Managed via SQLAlchemy and Alembic migrations in the backend.
- **LiveKit Ephemeral State**
- Room metadata, participant states, and signaling information persist in memory within the `livekit` container.
- LiveKits SFU media buffers and room state are **not** persisted across restarts.
#### Control Layer
**Infra:**
- **Caddy**
- TLS termination (Lets Encrypt).
- Reverse proxy and routing for all public domains.
- ACME certificate renewal.
- **Gitea**
- Git hosting, pull/clone over SSH and HTTPS.
- CI/CD orchestration via Actions and internal APIs.
- **Gitea Runner**
- Executes workflows and controls the Docker engine via `/var/run/docker.sock`.
**App:**
- **FastAPI Backend**
- Authentication and authorization (`/auth/login`, `/auth/refresh`, `/auth/me`).
- REST APIs for lessons, progress, documents, and file handling.
- LiveKit session management (room mapping `/sessions/default`, token minting `/sessions/default/token`, agent configuration).
- Calls out to OpenAI Realtime and Gemini Live APIs for AI-driven conversational behavior.
- **LiveKit Server**
- Manages room signaling, participant permissions, and session state.
- Exposes HTTP control endpoint for room and participant management.
#### Media Layer
**App:**
- **User Audio Path**
- Browser/mobile → LiveKit:
- WSS signaling via `rtc.avaaz.ai` → Caddy → `livekit:7880`.
- UDP audio and data channels via `rtc.avaaz.ai:5000060000` directly to LiveKit on the VPS.
- WebRTC handles ICE, STUN/TURN, jitter buffers, and Opus audio encoding.
- **AI Agent Audio Path**
- The agent logic inside the backend uses LiveKit Agent SDK to join rooms as a participant.
- Agent → LiveKit:
- WS signaling over the internal Docker network (`ws://livekit:7880`).
- UDP audio transport as part of its WebRTC session.
- Agent → LLM realtime API:
- Secure WSS/WebRTC connection to OpenAI Realtime or Gemini Live.
- The agent transcribes, processes, and generates audio responses, publishing them into the LiveKit room so the user hears natural speech.
### 1.7 CI/CD Pipeline
Production CI/CD is handled by **Gitea Actions** running on the VPS. The `gitea-runner` container has access to the host Docker daemon and is responsible for both validation and deployment:
- `.gitea/workflows/ci.yml` **Continuous Integration** (branch/PR validation, no deployment).
- `.gitea/workflows/cd.yml` **Continuous Deployment** (tag-based releases to production).
#### Build Phase (CI Workflow: `ci.yml`)
**Triggers**
- `push` to:
- `feature/**`
- `bugfix/**`
- `pull_request` targeting `main`.
**Runner & Environment**
- Runs on the self-hosted runner labeled `linux_amd64`.
- Checks out the relevant branch or PR commit from the `avaaz-app` repository into the runners workspace.
**Steps**
1. **Checkout code**
Uses `actions/checkout@v4` to fetch the branch or PR head commit.
2. **Report triggering context**
Logs the event type (`push` or `pull_request`) and branches:
- For `push`: the source branch (e.g., `feature/foo`).
- For `pull_request`: source and target (`main`).
3. **Static analysis & tests**
- Run linters, type checkers, and unit tests for backend and frontend.
- Ensure the application code compiles/builds.
4. **Build Docker images for CI**
- Build images (e.g., `frontend:ci` and `backend:ci`) to validate Dockerfiles and build chain.
- These images are tagged for CI only and not used for production.
5. **Cleanup CI images**
- Remove CI-tagged images at the end of the job (even on failure) to prevent disk usage from accumulating.
**Outcome**
- A green CI result on a branch/PR signals that:
- The code compiles/builds.
- Static checks and tests pass.
- Docker images can be built successfully.
- CI does **not** modify the production stack and does **not** depend on tags.
#### Deploy Phase (CD Workflow: `cd.yml`)
**Triggers**
- Creation of a Git tag matching `v*` that points to a commit on the `main` branch in the `avaaz-app` repository.
**Runner & Environment**
- Runs on the same `linux_amd64` self-hosted runner.
- Checks out the exact commit referenced by the tag.
**Steps**
1. **Checkout tagged commit**
- Uses `actions/checkout@v4` with `ref: ${{ gitea.ref }}` to check out the tagged commit.
2. **Tag validation**
- Fetches `origin/main`.
- Verifies that the tag commit is an ancestor of `origin/main` (i.e., the tag points to code that has been merged into `main`).
- Fails the deployment if the commit is not in `main`s history.
3. **Build & publish release**
- Builds production Docker images for frontend, backend, LiveKit, etc., tagged with the version (e.g., `v0.1.0`).
- Applies database migrations (e.g., via Alembic) if required.
4. **Restart production stack**
- Restarts or recreates the app stack containers using the newly built/tagged images (e.g., via `docker compose -f docker-compose.yml up -d`).
5. **Health & readiness checks**
- Probes key endpoints with `curl -f`, such as:
- `https://app.avaaz.ai`
- `https://api.avaaz.ai/health`
- `wss://rtc.avaaz.ai` (signaling-level check)
- If checks fail, marks the deployment as failed and automatically rolls back to previous images.
**Outcome**
- Only tagged releases whose commits are on the `main` branch are deployed.
- Deployment is explicit (tag-based), separated from CI validation.
### 1.8 Typical Workflows
#### User Login
1. Browser loads the frontend from `https://app.avaaz.ai`.
2. Frontend submits credentials to `POST https://api.avaaz.ai/auth/login`.
3. Backend validates credentials and returns:
- A short-lived JWT **access token**
- A long-lived opaque **refresh token**
- A minimal user profile for immediate UI hydration
4. Frontend stores tokens appropriately (access token in memory; refresh token in secure storage or an httpOnly cookie).
#### Load Persistent Session
1. Frontend calls `GET https://api.avaaz.ai/sessions/default`.
2. Backend retrieves or creates the users **persistent conversational session**, which encapsulates:
- Long-running conversation state
- Lesson and progress context
- Historical summary for LLM context initialization
3. Backend prepares the sessions LLM context so that the agent can join with continuity.
#### Join the Live Conversation Session
1. Frontend requests a LiveKit access token via `POST https://api.avaaz.ai/sessions/default/token`.
2. Backend generates a **new LiveKit token** (short-lived, room-scoped), containing:
- Identity
- Publish/subscribe permissions
- Expiration (affecting initial join)
- Room ID corresponding to the session
3. Frontend connects to the LiveKit server:
- WSS for signaling
- UDP/SCTP for low-latency audio and file transfer
4. If the user disconnects, the frontend requests a new LiveKit token before rejoining, ensuring seamless continuity.
#### Conversation with AI Agent
1. Backend configures the sessions **AI agent** using:
- Historical summary
- Current lesson state
- Language settings and mode (lesson, mock exam, free talk)
2. The agent joins the same LiveKit room as a participant.
3. All media flows through LiveKit:
- User → audio → LiveKit → Agent
- Agent → LLM realtime API → synthesized audio → LiveKit → User
4. The agent guides the user verbally: continuing lessons, revisiting material, running mock exams, or free conversation.
The user experiences this as a **continuous, ongoing session** with seamless reconnection and state persistence.
### 1.9 Hardware
| Class | Description |
|----------------|-------------------------------------------|
| system | Standard PC (i440FX + PIIX, 1996) |
| bus | Motherboard |
| memory | 96KiB BIOS |
| processor | AMD EPYC-Milan Processor |
| memory | 8GiB System Memory |
| bridge | 440FX - 82441FX PMC [Natoma] |
| bridge | 82371SB PIIX3 ISA [Natoma/Triton II] |
| communication | PnP device PNP0501 |
| input | PnP device PNP0303 |
| input | PnP device PNP0f13 |
| storage | PnP device PNP0700 |
| system | PnP device PNP0b00 |
| storage | 82371SB PIIX3 IDE [Natoma/Triton II] |
| bus | 82371SB PIIX3 USB [Natoma/Triton II] |
| bus | UHCI Host Controller |
| input | QEMU USB Tablet |
| bridge | 82371AB/EB/MB PIIX4 ACPI |
| display | QXL paravirtual graphic card |
| generic | Virtio RNG |
| storage | Virtio block device |
| disk | 257GB Virtual I/O device |
| volume | 238GiB EXT4 volume |
| volume | 4095KiB BIOS Boot partition |
| volume | 105MiB Windows FAT volume |
| volume | 913MiB EXT4 volume |
| network | Virtio network device |
| network | Ethernet interface |
| input | Power Button |
| input | AT Translated Set 2 keyboard |
| input | VirtualPS/2 VMware VMMouse |
## 2. Development Laptop
### 2.1 Components
#### App Stack (local Docker)
- `frontend` (Next.js SPA)
- `backend` (FastAPI)
- `postgres` (PostgreSQL + pgvector)
- `livekit` (local LiveKit Server)
No Caddy is deployed locally; the browser talks directly to the mapped container ports on `localhost`.
### 2.2 Network
- All services run as Docker containers on a shared Docker network.
- Selected ports are published to `localhost` for direct access from the browser and local tools.
- No public domains are used in development; everything is addressed via `http://localhost/...`.
### 2.3 Domains & IP Addresses
Local development uses:
- `http://localhost:3000` → frontend (Next.js dev/server container)
- `http://localhost:8000` → backend API (FastAPI)
- Example auth/session endpoints:
- `POST http://localhost:8000/auth/login`
- `GET http://localhost:8000/sessions/default`
- `POST http://localhost:8000/sessions/default/token`
- `ws://localhost:7880` → LiveKit signaling (local LiveKit server)
- `udp://localhost:5000060000` → LiveKit/WebRTC media
No `/etc/hosts` changes or TLS certificates are required; `localhost` acts as a secure origin for WebRTC.
### 2.4 Ports & Protocols
| Port | Protocol | Purpose |
|-------------:|:--------:|------------------------------------|
| 3000 | TCP | Frontend (Next.js) |
| 8000 | TCP | Backend API (FastAPI) |
| 5432 | TCP | Postgres + pgvector |
| 7880 | TCP | LiveKit HTTP + WS signaling |
| 5000060000 | UDP | LiveKit WebRTC media (audio, data) |
### 2.5 Routing
No local Caddy or reverse proxy layer is used; routing is direct via published ports.
#### Internal Container Routing (Docker network)
- Backend → Postgres: `postgres://postgres:5432`
- Backend → LiveKit: `http://livekit:7880`
- Frontend (server-side) → Backend: `http://backend:8000`
#### Browser → Containers (via localhost)
- Browser → Frontend: `http://localhost:3000`
- Browser → Backend API: `http://localhost:8000`
#### Outgoing (from Backend)
- `backend``https://api.openai.com/v1/realtime/sessions`
- `backend``wss://api.openai.com/v1/realtime?model=gpt-realtime`
- `backend``wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`
These calls mirror production agent behavior while pointing to the same cloud LLM realtime endpoints.
### 2.6 Functional Layers
#### Data Layer
- Local Postgres instance mirrors the production schema (including pgvector).
- Database migrations are applied via backend tooling (e.g., Alembic) to keep schema in sync.
#### Control Layer
- Backend runs full application logic locally:
- Authentication and authorization
- Lesson and progress APIs
- LiveKit session management (`/sessions/default`, `/sessions/default/token`) and agent control
- Frontend integrates against the same API surface as production, only with `localhost` URLs.
#### Media Layer
- Local LiveKit instance handles:
- WS/HTTP signaling on port 7880
- WebRTC media (audio + data channels) on UDP `5000060000`
- Agent traffic mirrors production logic:
- LiveKit ↔ Backend ↔ LLM realtime APIs (OpenAI / Gemini).
### 2.7 Typical Workflows
#### Developer Pushes Code
1. Developer pushes to `git.avaaz.ai` over HTTPS/SSL or SSH.
2. CI runs automatically (linting, tests, build validation). No deployment occurs.
3. When a release is ready, the developer creates a version tag (`v*`) on a commit in `main`.
4. CD triggers: validates the tag, rebuilds from the tagged commit, deploys updated containers, then performs post-deploy health checks.
#### App Development
- Start the stack: `docker compose -f docker-compose.dev.yml up -d`
- Open the app in the browser: `http://localhost:3000`
- Frontend calls the local backend for:
- `POST http://localhost:8000/auth/login`
- `GET http://localhost:8000//sessions/default`
- `POST http://localhost:8000//sessions/default/token`
#### API Testing
- Health check: `curl http://localhost:8000/health`
- Auth and session testing:
```bash
curl -X POST http://localhost:8000/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "user@example.com", "password": "password"}'
curl http://localhost:8000/sessions/default \
-H "Authorization: Bearer <access_token>"
```
#### LiveKit Testing
- Frontend connects to LiveKit via:
- Signaling: `ws://localhost:7880`
- WebRTC media: `udp://localhost:5000060000`
- Backend issues local LiveKit tokens via `POST http://localhost:8000//sessions/default/token`, then connects the AI agent to the local room.
### 2.8 Hardware
| Class | Description |
|----------------|--------------------------------------------|
| system | HP Laptop 14-em0xxx |
| bus | 8B27 motherboard bus |
| memory | 128KiB BIOS |
| processor | AMD Ryzen 3 7320U |
| memory | 256KiB L1 cache |
| memory | 2MiB L2 cache |
| memory | 4MiB L3 cache |
| memory | 8GiB System Memory |
| bridge | Family 17h-19h PCIe Root Complex |
| generic | Family 17h-19h IOMMU |
| storage | SK hynix BC901 HFS256GE SSD |
| disk | 256GB NVMe disk |
| volume | 299MiB Windows FAT volume |
| volume | 238GiB EXT4 volume |
| network | RTL8852BE PCIe 802.11ax Wi-Fi |
| display | Mendocino integrated graphics |
| multimedia | Rembrandt Radeon High Definition Audio |
| generic | Family 19h PSP/CCP |
| bus | AMD xHCI Host Controller |
| input | Logitech M705 Mouse |
| input | Logitech K370s/K375s Keyboard |
| multimedia | Jabra SPEAK 510 USB |
| multimedia | Logitech Webcam C925e |
| communication | Bluetooth Radio |
| multimedia | HP True Vision HD Camera |
| bus | FCH SMBus Controller |
| bridge | FCH LPC Bridge |
| power | AE03041 Battery |
| input | Power Button |
| input | Lid Switch |
| input | HP WMI Hotkeys |
| input | AT Translated Set 2 Keyboard |
| input | Video Bus |
| input | SYNA32D9:00 06CB:CE17 Mouse |
| input | SYNA32D9:00 06CB:CE17 Touchpad |
| network | Ethernet Interface |