Add dockerized app and infra scaffolding

This commit is contained in:
2025-11-26 06:49:58 +01:00
parent 575c02431e
commit 01ebc23e3f
17 changed files with 2426 additions and 111 deletions

613
docs/architecture.md Normal file
View File

@@ -0,0 +1,613 @@
# System Architecture
Below is a summary of the **Production VPS** and **Development Laptop** architectures. Both environments use Docker containers for consistency, with near-identical stacks where practical.
```mermaid
flowchart LR
%% Client
A(Browser / PWA)
Y(iOS App / Android App)
subgraph User
A
Y
end
%% LLM / Realtime
B(OpenAI Realtime API)
Z(Gemini Live API)
subgraph Large Language Model
B
Z
end
%% Server-side
C(Caddy)
I(Gitea + Actions + Repositories)
J(Gitea Runner)
D(Next.js Frontend)
E(FastAPI Backend + Agent Runtime)
G(LiveKit Server)
H[(PostgreSQL + pgvector)]
%% Client ↔ VPS
A <-- https://www.avaaz.ai --> C
A <-- https://app.avaaz.ai --> C
A & Y <-- https://api.avaaz.ai --> C
A & Y <-- wss://rtc.avaaz.ai --> C
A & Y <-- "udp://rtc.avaaz.ai:50000-60000 (WebRTC Media)" --> G
%% Caddy ↔ App
C <-- "http://frontend:3000 (app)" --> D
C <-- "http://backend:8000 (api)" --> E
C <-- "ws://livekit:7880 (WebRTC signaling)" --> G
C <-- "http://gitea:3000 (git)" --> I
%% App internal
D <-- "http://backend:8000" --> E
E <-- "postgresql://postgres:5432" --> H
E <-- "http://livekit:7880 (control)" --> G
E <-- "Agent joins via WebRTC" --> G
%% Agent ↔ LLM
E <-- "WSS/WebRTC (realtime)" --> B
E <-- "WSS (streaming)" --> Z
%% CI/CD
I <-- "CI/CD triggers" --> J
subgraph VPS
subgraph Infra
C
I
J
end
subgraph App
D
E
G
H
end
end
%% Development Environment
L(VS Code + Git + Docker)
M(Local Docker Compose)
N(Local Browser)
O(Local Frontend)
P(Local Backend)
Q[(Local Postgres)]
R(Local LiveKit)
L <-- "https://git.avaaz.ai/...git" --> C
L <-- "ssh://git@git.avaaz.ai:2222/..." --> I
L -- "docker compose up" --> M
M -- "Build & Run" --> O & P & Q & R
N <-- HTTP --> O & P
N <-- WebRTC --> R
O <-- HTTP --> P
P <-- SQL --> Q
P <-- HTTP/WebRTC --> R
P <-- WSS/WebRTC --> B
P <-- WSS --> Z
subgraph Development Laptop
L
M
N
subgraph Local App
O
P
Q
R
end
end
```
## 1. Production VPS
### 1.1 Components
#### Infra Stack
Docker Compose: `./infra/docker-compose.yml`.
| Container | Description |
| -------------- | ----------------------------------------------------------------------------------- |
| `caddy` | **Caddy** Reverse proxy with automatic HTTPS (TLS termination via Lets Encrypt). |
| `gitea` | **Gitea + Actions** Git server using SQLite. Automated CI/CD workflows. |
| `gitea-runner` | **Gitea Runner** Executes CI/CD jobs defined in Gitea Actions workflows. |
#### App Stack
Docker Compose: `./app/docker-compose.yml`.
| Container | Description |
| ---------- | ----------------------------------------------------------------------------------------- |
| `frontend` | **Next.js Frontend** SPA/PWA interface served from a Node.js-based Next.js server. |
| `backend` | **FastAPI + Uvicorn Backend** API, auth, business logic, LiveKit orchestration, agent. |
| `postgres` | **PostgreSQL + pgvector** Persistent relational database with vector search. |
| `livekit` | **LiveKit Server** WebRTC signaling plus UDP media for real-time audio and data. |
The `backend` uses several Python packages such as UV, Ruff, FastAPI, FastAPI Users, FastAPI-pagination, FastStream, FastMCP, Pydantic, PydanticAI, Pydantic-settings, LiveKit Agent, Google Gemini Live API, OpenAI Realtime API, SQLAlchemy, Alembic, docling, Gunicorn, Uvicorn[standard], Pyright, Pytest, Hypothesis, and Httpx to deliver the services.
### 1.2 Network
- All containers join a shared `proxy` Docker network.
- Caddy can route to any service by container name.
- App services communicate internally:
- Frontend ↔ Backend
- Backend ↔ Postgres
- Backend ↔ LiveKit
- Backend (agent) ↔ LiveKit & external LLM realtime APIs
### 1.3 Public DNS Records
| Hostname | Record Type | Target | Purpose |
| -------------------- | :---------: | -------------- | -------------------------------- |
| **www\.avaaz\.ai** | CNAME | avaaz.ai | Marketing / landing site |
| **avaaz.ai** | A | 217.154.51.242 | Root domain |
| **app.avaaz.ai** | A | 217.154.51.242 | Next.js frontend (SPA/PWA) |
| **api.avaaz.ai** | A | 217.154.51.242 | FastAPI backend |
| **rtc.avaaz.ai** | A | 217.154.51.242 | LiveKit signaling + media |
| **git.avaaz.ai** | A | 217.154.51.242 | Gitea (HTTPS + SSH) |
### 1.4 Public Inbound Firewall Ports & Protocols
| Port | Protocol | Purpose |
| -------------: | :------: | --------------------------------------- |
| **80** | TCP | HTTP, ACME HTTP-01 challenge |
| **443** | TCP | HTTPS, WSS (frontend, backend, LiveKit) |
| **2222** | TCP | Git SSH via Gitea |
| **2885** | TCP | VPS SSH access |
| **3478** | UDP | STUN/TURN |
| **5349** | TCP | TURN over TLS |
| **7881** | TCP | LiveKit TCP fallback |
| **5000060000**| UDP | LiveKit WebRTC media |
### 1.5 Routing
#### Caddy
Caddy routes traffic from public ports 80 and 443 to internal services.
- `https://www.avaaz.ai``http://frontend:3000`
- `https://app.avaaz.ai``http://frontend:3000`
- `https://api.avaaz.ai``http://backend:8000`
- `wss://rtc.avaaz.ai``ws://livekit:7880`
- `https://git.avaaz.ai``http://gitea:3000`
#### Internal Container Network
- `frontend``http://backend:8000`
- `backend``postgres://postgres:5432`
- `backend``http://livekit:7880` (control)
- `backend``ws://livekit:7880` (signaling)
- `backend``udp://livekit:50000-60000` (media)
- `gitea-runner``/var/run/docker.sock` (Docker API on host)
#### Outgoing
- `backend``https://api.openai.com/v1/realtime/sessions`
- `backend``wss://api.openai.com/v1/realtime?model=gpt-realtime`
- `backend``wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`
### 1.6 Functional Layers
#### Data Layer
**Infra:**
- **SQLite (Gitea)**
- Gitea stores Git metadata (users, repos, issues, Actions metadata) in `/data/gitea/gitea.db`.
- This is a file-backed SQLite database inside a persistent Docker volume.
- Repository contents are stored under `/data/git/`, also volume-backed.
- **Gitea Runner State**
- Gitea Actions runner stores its registration information and job metadata under `/data/.runner`.
**App:**
- **PostgreSQL with pgvector**
- Primary relational database for users, lessons, transcripts, embeddings, and conversational context.
- Hosted in the `postgres` container with a persistent Docker volume.
- Managed via SQLAlchemy and Alembic migrations in the backend.
- **LiveKit Ephemeral State**
- Room metadata, participant states, and signaling information persist in memory within the `livekit` container.
- LiveKits SFU media buffers and room state are **not** persisted across restarts.
#### Control Layer
**Infra:**
- **Caddy**
- TLS termination (Lets Encrypt).
- Reverse proxy and routing for all public domains.
- ACME certificate renewal.
- **Gitea**
- Git hosting, pull/clone over SSH and HTTPS.
- CI/CD orchestration via Actions and internal APIs.
- **Gitea Runner**
- Executes workflows and controls the Docker engine via `/var/run/docker.sock`.
**App:**
- **FastAPI Backend**
- Authentication and authorization (`/auth/login`, `/auth/refresh`, `/auth/me`).
- REST APIs for lessons, progress, documents, and file handling.
- LiveKit session management (room mapping `/sessions/default`, token minting `/sessions/default/token`, agent configuration).
- Calls out to OpenAI Realtime and Gemini Live APIs for AI-driven conversational behavior.
- **LiveKit Server**
- Manages room signaling, participant permissions, and session state.
- Exposes HTTP control endpoint for room and participant management.
#### Media Layer
**App:**
- **User Audio Path**
- Browser/mobile → LiveKit:
- WSS signaling via `rtc.avaaz.ai` → Caddy → `livekit:7880`.
- UDP audio and data channels via `rtc.avaaz.ai:5000060000` directly to LiveKit on the VPS.
- WebRTC handles ICE, STUN/TURN, jitter buffers, and Opus audio encoding.
- **AI Agent Audio Path**
- The agent logic inside the backend uses LiveKit Agent SDK to join rooms as a participant.
- Agent → LiveKit:
- WS signaling over the internal Docker network (`ws://livekit:7880`).
- UDP audio transport as part of its WebRTC session.
- Agent → LLM realtime API:
- Secure WSS/WebRTC connection to OpenAI Realtime or Gemini Live.
- The agent transcribes, processes, and generates audio responses, publishing them into the LiveKit room so the user hears natural speech.
### 1.7 CI/CD Pipeline
Production CI/CD is handled by **Gitea Actions** running on the VPS. The `gitea-runner` container has access to the host Docker daemon and is responsible for both validation and deployment:
- `.gitea/workflows/ci.yml` **Continuous Integration** (branch/PR validation, no deployment).
- `.gitea/workflows/cd.yml` **Continuous Deployment** (tag-based releases to production).
#### Build Phase (CI Workflow: `ci.yml`)
**Triggers**
- `push` to:
- `feature/**`
- `bugfix/**`
- `pull_request` targeting `main`.
**Runner & Environment**
- Runs on the self-hosted runner labeled `linux_amd64`.
- Checks out the relevant branch or PR commit from the `avaaz-app` repository into the runners workspace.
**Steps**
1. **Checkout code**
Uses `actions/checkout@v4` to fetch the branch or PR head commit.
2. **Report triggering context**
Logs the event type (`push` or `pull_request`) and branches:
- For `push`: the source branch (e.g., `feature/foo`).
- For `pull_request`: source and target (`main`).
3. **Static analysis & tests**
- Run linters, type checkers, and unit tests for backend and frontend.
- Ensure the application code compiles/builds.
4. **Build Docker images for CI**
- Build images (e.g., `frontend:ci` and `backend:ci`) to validate Dockerfiles and build chain.
- These images are tagged for CI only and not used for production.
5. **Cleanup CI images**
- Remove CI-tagged images at the end of the job (even on failure) to prevent disk usage from accumulating.
**Outcome**
- A green CI result on a branch/PR signals that:
- The code compiles/builds.
- Static checks and tests pass.
- Docker images can be built successfully.
- CI does **not** modify the production stack and does **not** depend on tags.
#### Deploy Phase (CD Workflow: `cd.yml`)
**Triggers**
- Creation of a Git tag matching `v*` that points to a commit on the `main` branch in the `avaaz-app` repository.
**Runner & Environment**
- Runs on the same `linux_amd64` self-hosted runner.
- Checks out the exact commit referenced by the tag.
**Steps**
1. **Checkout tagged commit**
- Uses `actions/checkout@v4` with `ref: ${{ gitea.ref }}` to check out the tagged commit.
2. **Tag validation**
- Fetches `origin/main`.
- Verifies that the tag commit is an ancestor of `origin/main` (i.e., the tag points to code that has been merged into `main`).
- Fails the deployment if the commit is not in `main`s history.
3. **Build & publish release**
- Builds production Docker images for frontend, backend, LiveKit, etc., tagged with the version (e.g., `v0.1.0`).
- Applies database migrations (e.g., via Alembic) if required.
4. **Restart production stack**
- Restarts or recreates the app stack containers using the newly built/tagged images (e.g., via `docker compose -f docker-compose.yml up -d`).
5. **Health & readiness checks**
- Probes key endpoints with `curl -f`, such as:
- `https://app.avaaz.ai`
- `https://api.avaaz.ai/health`
- `wss://rtc.avaaz.ai` (signaling-level check)
- If checks fail, marks the deployment as failed and automatically rolls back to previous images.
**Outcome**
- Only tagged releases whose commits are on the `main` branch are deployed.
- Deployment is explicit (tag-based), separated from CI validation.
### 1.8 Typical Workflows
#### User Login
1. Browser loads the frontend from `https://app.avaaz.ai`.
2. Frontend submits credentials to `POST https://api.avaaz.ai/auth/login`.
3. Backend validates credentials and returns:
- A short-lived JWT **access token**
- A long-lived opaque **refresh token**
- A minimal user profile for immediate UI hydration
4. Frontend stores tokens appropriately (access token in memory; refresh token in secure storage or an httpOnly cookie).
#### Load Persistent Session
1. Frontend calls `GET https://api.avaaz.ai/sessions/default`.
2. Backend retrieves or creates the users **persistent conversational session**, which encapsulates:
- Long-running conversation state
- Lesson and progress context
- Historical summary for LLM context initialization
3. Backend prepares the sessions LLM context so that the agent can join with continuity.
#### Join the Live Conversation Session
1. Frontend requests a LiveKit access token via `POST https://api.avaaz.ai/sessions/default/token`.
2. Backend generates a **new LiveKit token** (short-lived, room-scoped), containing:
- Identity
- Publish/subscribe permissions
- Expiration (affecting initial join)
- Room ID corresponding to the session
3. Frontend connects to the LiveKit server:
- WSS for signaling
- UDP/SCTP for low-latency audio and file transfer
4. If the user disconnects, the frontend requests a new LiveKit token before rejoining, ensuring seamless continuity.
#### Conversation with AI Agent
1. Backend configures the sessions **AI agent** using:
- Historical summary
- Current lesson state
- Language settings and mode (lesson, mock exam, free talk)
2. The agent joins the same LiveKit room as a participant.
3. All media flows through LiveKit:
- User → audio → LiveKit → Agent
- Agent → LLM realtime API → synthesized audio → LiveKit → User
4. The agent guides the user verbally: continuing lessons, revisiting material, running mock exams, or free conversation.
The user experiences this as a **continuous, ongoing session** with seamless reconnection and state persistence.
### 1.9 Hardware
| Class | Description |
|----------------|-------------------------------------------|
| system | Standard PC (i440FX + PIIX, 1996) |
| bus | Motherboard |
| memory | 96KiB BIOS |
| processor | AMD EPYC-Milan Processor |
| memory | 8GiB System Memory |
| bridge | 440FX - 82441FX PMC [Natoma] |
| bridge | 82371SB PIIX3 ISA [Natoma/Triton II] |
| communication | PnP device PNP0501 |
| input | PnP device PNP0303 |
| input | PnP device PNP0f13 |
| storage | PnP device PNP0700 |
| system | PnP device PNP0b00 |
| storage | 82371SB PIIX3 IDE [Natoma/Triton II] |
| bus | 82371SB PIIX3 USB [Natoma/Triton II] |
| bus | UHCI Host Controller |
| input | QEMU USB Tablet |
| bridge | 82371AB/EB/MB PIIX4 ACPI |
| display | QXL paravirtual graphic card |
| generic | Virtio RNG |
| storage | Virtio block device |
| disk | 257GB Virtual I/O device |
| volume | 238GiB EXT4 volume |
| volume | 4095KiB BIOS Boot partition |
| volume | 105MiB Windows FAT volume |
| volume | 913MiB EXT4 volume |
| network | Virtio network device |
| network | Ethernet interface |
| input | Power Button |
| input | AT Translated Set 2 keyboard |
| input | VirtualPS/2 VMware VMMouse |
## 2. Development Laptop
### 2.1 Components
#### App Stack (local Docker)
- `frontend` (Next.js SPA)
- `backend` (FastAPI)
- `postgres` (PostgreSQL + pgvector)
- `livekit` (local LiveKit Server)
No Caddy is deployed locally; the browser talks directly to the mapped container ports on `localhost`.
### 2.2 Network
- All services run as Docker containers on a shared Docker network.
- Selected ports are published to `localhost` for direct access from the browser and local tools.
- No public domains are used in development; everything is addressed via `http://localhost/...`.
### 2.3 Domains & IP Addresses
Local development uses:
- `http://localhost:3000` → frontend (Next.js dev/server container)
- `http://localhost:8000` → backend API (FastAPI)
- Example auth/session endpoints:
- `POST http://localhost:8000/auth/login`
- `GET http://localhost:8000/sessions/default`
- `POST http://localhost:8000/sessions/default/token`
- `ws://localhost:7880` → LiveKit signaling (local LiveKit server)
- `udp://localhost:5000060000` → LiveKit/WebRTC media
No `/etc/hosts` changes or TLS certificates are required; `localhost` acts as a secure origin for WebRTC.
### 2.4 Ports & Protocols
| Port | Protocol | Purpose |
|-------------:|:--------:|------------------------------------|
| 3000 | TCP | Frontend (Next.js) |
| 8000 | TCP | Backend API (FastAPI) |
| 5432 | TCP | Postgres + pgvector |
| 7880 | TCP | LiveKit HTTP + WS signaling |
| 5000060000 | UDP | LiveKit WebRTC media (audio, data) |
### 2.5 Routing
No local Caddy or reverse proxy layer is used; routing is direct via published ports.
#### Internal Container Routing (Docker network)
- Backend → Postgres: `postgres://postgres:5432`
- Backend → LiveKit: `http://livekit:7880`
- Frontend (server-side) → Backend: `http://backend:8000`
#### Browser → Containers (via localhost)
- Browser → Frontend: `http://localhost:3000`
- Browser → Backend API: `http://localhost:8000`
#### Outgoing (from Backend)
- `backend``https://api.openai.com/v1/realtime/sessions`
- `backend``wss://api.openai.com/v1/realtime?model=gpt-realtime`
- `backend``wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`
These calls mirror production agent behavior while pointing to the same cloud LLM realtime endpoints.
### 2.6 Functional Layers
#### Data Layer
- Local Postgres instance mirrors the production schema (including pgvector).
- Database migrations are applied via backend tooling (e.g., Alembic) to keep schema in sync.
#### Control Layer
- Backend runs full application logic locally:
- Authentication and authorization
- Lesson and progress APIs
- LiveKit session management (`/sessions/default`, `/sessions/default/token`) and agent control
- Frontend integrates against the same API surface as production, only with `localhost` URLs.
#### Media Layer
- Local LiveKit instance handles:
- WS/HTTP signaling on port 7880
- WebRTC media (audio + data channels) on UDP `5000060000`
- Agent traffic mirrors production logic:
- LiveKit ↔ Backend ↔ LLM realtime APIs (OpenAI / Gemini).
### 2.7 Typical Workflows
#### Developer Pushes Code
1. Developer pushes to `git.avaaz.ai` over HTTPS/SSL or SSH.
2. CI runs automatically (linting, tests, build validation). No deployment occurs.
3. When a release is ready, the developer creates a version tag (`v*`) on a commit in `main`.
4. CD triggers: validates the tag, rebuilds from the tagged commit, deploys updated containers, then performs post-deploy health checks.
#### App Development
- Start the stack: `docker compose -f docker-compose.dev.yml up -d`
- Open the app in the browser: `http://localhost:3000`
- Frontend calls the local backend for:
- `POST http://localhost:8000/auth/login`
- `GET http://localhost:8000//sessions/default`
- `POST http://localhost:8000//sessions/default/token`
#### API Testing
- Health check: `curl http://localhost:8000/health`
- Auth and session testing:
```bash
curl -X POST http://localhost:8000/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "user@example.com", "password": "password"}'
curl http://localhost:8000/sessions/default \
-H "Authorization: Bearer <access_token>"
```
#### LiveKit Testing
- Frontend connects to LiveKit via:
- Signaling: `ws://localhost:7880`
- WebRTC media: `udp://localhost:5000060000`
- Backend issues local LiveKit tokens via `POST http://localhost:8000//sessions/default/token`, then connects the AI agent to the local room.
### 2.8 Hardware
| Class | Description |
|----------------|--------------------------------------------|
| system | HP Laptop 14-em0xxx |
| bus | 8B27 motherboard bus |
| memory | 128KiB BIOS |
| processor | AMD Ryzen 3 7320U |
| memory | 256KiB L1 cache |
| memory | 2MiB L2 cache |
| memory | 4MiB L3 cache |
| memory | 8GiB System Memory |
| bridge | Family 17h-19h PCIe Root Complex |
| generic | Family 17h-19h IOMMU |
| storage | SK hynix BC901 HFS256GE SSD |
| disk | 256GB NVMe disk |
| volume | 299MiB Windows FAT volume |
| volume | 238GiB EXT4 volume |
| network | RTL8852BE PCIe 802.11ax Wi-Fi |
| display | Mendocino integrated graphics |
| multimedia | Rembrandt Radeon High Definition Audio |
| generic | Family 19h PSP/CCP |
| bus | AMD xHCI Host Controller |
| input | Logitech M705 Mouse |
| input | Logitech K370s/K375s Keyboard |
| multimedia | Jabra SPEAK 510 USB |
| multimedia | Logitech Webcam C925e |
| communication | Bluetooth Radio |
| multimedia | HP True Vision HD Camera |
| bus | FCH SMBus Controller |
| bridge | FCH LPC Bridge |
| power | AE03041 Battery |
| input | Power Button |
| input | Lid Switch |
| input | HP WMI Hotkeys |
| input | AT Translated Set 2 Keyboard |
| input | Video Bus |
| input | SYNA32D9:00 06CB:CE17 Mouse |
| input | SYNA32D9:00 06CB:CE17 Touchpad |
| network | Ethernet Interface |