# System Architecture Below is a summary of the **Production VPS** and **Development Laptop** architectures. Both environments use Docker containers for consistency, with near-identical stacks where practical. ```mermaid flowchart LR %% Client A(Browser) Y(iOS App / Android App) subgraph User A Y end %% LLM / Realtime B(OpenAI Realtime API) Z(Gemini Live API) subgraph Large Language Model B Z end %% Server-side C(Caddy) I(Gitea + Actions + Repositories) J(Gitea Runner) D(React/Next.js/Tailwind Frontend) E(FastAPI Backend + Agent Runtime) G(LiveKit Server) H[(PostgreSQL + pgvector)] %% Client ↔ VPS A <-- https://www.avaaz.ai --> C A <-- https://app.avaaz.ai --> C A & Y <-- https://api.avaaz.ai --> C A & Y <-- wss://rtc.avaaz.ai --> C A & Y <-- "udp://rtc.avaaz.ai:50000-60000 (WebRTC Media)" --> G %% Caddy ↔ App C <-- "http://frontend:3000 (app)" --> D C <-- "http://backend:8000 (api)" --> E C <-- "ws://livekit:7880 (WebRTC signaling)" --> G C <-- "http://gitea:3000 (git)" --> I %% App internal D <-- "http://backend:8000" --> E E <-- "postgresql://postgres:5432" --> H E <-- "http://livekit:7880 (control)" --> G E <-- "Agent joins via WebRTC" --> G %% Agent ↔ LLM E <-- "WSS/WebRTC (realtime)" --> B E <-- "WSS (streaming)" --> Z %% CI/CD I <-- "CI/CD triggers" --> J subgraph VPS subgraph Infra C I J end subgraph App D E G H end end %% Development Environment L(VS Code + Git + Docker) M(Local Docker Compose) N(Local Browser) O(Local Frontend) P(Local Backend) Q[(Local Postgres)] R(Local LiveKit) L <-- "https://git.avaaz.ai/...git" --> C L <-- "ssh://git@git.avaaz.ai:2222/..." --> I L -- "docker compose up" --> M M -- "Build & Run" --> O & P & Q & R N <-- HTTP --> O & P N <-- WebRTC --> R O <-- HTTP --> P P <-- SQL --> Q P <-- HTTP/WebRTC --> R P <-- WSS/WebRTC --> B P <-- WSS --> Z subgraph Development Laptop L M N subgraph Local App O P Q R end end ``` ## 1. Production VPS ### 1.1 Components #### Infra Stack Docker Compose from `./infra/docker-compose.yml` is cloned to `/srv/infra/docker-compose.yml` on the VPS. | Container | Description | | -------------- | ----------------------------------------------------------------------------------- | | `caddy` | **Caddy** – Reverse proxy with automatic HTTPS (TLS termination via Let’s Encrypt). | | `gitea` | **Gitea + Actions** – Git server using SQLite. Automated CI/CD workflows. | | `gitea-runner` | **Gitea Runner** – Executes CI/CD jobs defined in Gitea Actions workflows. | #### App Stack Docker Compose from `./app/docker-compose.yml` is cloned to `/srv/app/docker-compose.yml` on the VPS. | Container | Description | | ---------- | ----------------------------------------------------------------------------------------- | | `frontend` | **React/Next.js/Tailwind Frontend** – SPA interface served from a Node.js-based Next.js server. | | `backend` | **FastAPI + Uvicorn Backend** – API, auth, business logic, LiveKit orchestration, agent. | | `postgres` | **PostgreSQL + pgvector** – Persistent relational database with vector search. | | `livekit` | **LiveKit Server** – WebRTC signaling plus UDP media for real-time audio and data. | The `backend` uses several Python packages such as UV, Ruff, FastAPI, FastAPI Users, FastAPI-pagination, FastStream, FastMCP, Pydantic, PydanticAI, Pydantic-settings, LiveKit Agent, Google Gemini Live API, OpenAI Realtime API, SQLAlchemy, Alembic, docling, Gunicorn, Uvicorn[standard], Pyright, Pytest, Hypothesis, and Httpx to deliver the services. ### 1.2 Network - All containers join a shared `proxy` Docker network. - Caddy can route to any service by container name. - App services communicate internally: - Frontend ↔ Backend - Backend ↔ Postgres - Backend ↔ LiveKit - Backend (agent) ↔ LiveKit & external LLM realtime APIs ### 1.3 Public DNS Records | Hostname | Record Type | Target | Purpose | | -------------------- | :---------: | -------------- | -------------------------------- | | **www\.avaaz\.ai** | CNAME | avaaz.ai | Marketing / landing site | | **avaaz.ai** | A | 217.154.51.242 | Root domain | | **app.avaaz.ai** | A | 217.154.51.242 | React/Next.js/Tailwind frontend (SPA) | | **api.avaaz.ai** | A | 217.154.51.242 | FastAPI backend | | **rtc.avaaz.ai** | A | 217.154.51.242 | LiveKit signaling + media | | **git.avaaz.ai** | A | 217.154.51.242 | Gitea (HTTPS + SSH) | ### 1.4 Public Inbound Firewall Ports & Protocols | Port | Protocol | Purpose | | -------------: | :------: | --------------------------------------- | | **80** | TCP | HTTP, ACME HTTP-01 challenge | | **443** | TCP | HTTPS, WSS (frontend, backend, LiveKit) | | **2222** | TCP | Git SSH via Gitea | | **2885** | TCP | VPS SSH access | | **3478** | UDP | STUN/TURN | | **5349** | TCP | TURN over TLS | | **7881** | TCP | LiveKit TCP fallback | | **50000–60000**| UDP | LiveKit WebRTC media | ### 1.5 Routing #### Caddy Caddy routes traffic from public ports 80 and 443 to internal services. - `https://www.avaaz.ai` → `http://frontend:3000` - `https://app.avaaz.ai` → `http://frontend:3000` - `https://api.avaaz.ai` → `http://backend:8000` - `wss://rtc.avaaz.ai` → `ws://livekit:7880` - `https://git.avaaz.ai` → `http://gitea:3000` #### Internal Container Network - `frontend` → `http://backend:8000` - `backend` → `postgres://postgres:5432` - `backend` → `http://livekit:7880` (control) - `backend` → `ws://livekit:7880` (signaling) - `backend` → `udp://livekit:50000-60000` (media) - `gitea-runner` → `/var/run/docker.sock` (Docker API on host) #### Outgoing - `backend` → `https://api.openai.com/v1/realtime/sessions` - `backend` → `wss://api.openai.com/v1/realtime?model=gpt-realtime` - `backend` → `wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent` ### 1.6 Functional Layers #### Data Layer **Infra:** - **SQLite (Gitea)** - Gitea stores Git metadata (users, repos, issues, Actions metadata) in `/data/gitea/gitea.db`. - This is a file-backed SQLite database inside a persistent Docker volume. - Repository contents are stored under `/data/git/`, also volume-backed. - **Gitea Runner State** - Gitea Actions runner stores its registration information and job metadata under `/data/.runner`. **App:** - **PostgreSQL with pgvector** - Primary relational database for users, lessons, transcripts, embeddings, and conversational context. - Hosted in the `postgres` container with a persistent Docker volume. - Managed via SQLAlchemy and Alembic migrations in the backend. - **LiveKit Ephemeral State** - Room metadata, participant states, and signaling information persist in memory within the `livekit` container. - LiveKit’s SFU media buffers and room state are **not** persisted across restarts. #### Control Layer **Infra:** - **Caddy** - TLS termination (Let’s Encrypt). - Reverse proxy and routing for all public domains. - ACME certificate renewal. - **Gitea** - Git hosting, pull/clone over SSH and HTTPS. - CI/CD orchestration via Actions and internal APIs. - **Gitea Runner** - Executes workflows and controls the Docker engine via `/var/run/docker.sock`. **App:** - **FastAPI Backend** - Authentication and authorization (`/auth/login`, `/auth/refresh`, `/auth/me`). - REST APIs for lessons, progress, documents, and file handling. - LiveKit session management (room mapping `/sessions/default`, token minting `/sessions/default/token`, agent configuration). - Calls out to OpenAI Realtime and Gemini Live APIs for AI-driven conversational behavior. - **LiveKit Server** - Manages room signaling, participant permissions, and session state. - Exposes HTTP control endpoint for room and participant management. #### Media Layer **App:** - **User Audio Path** - Browser/mobile → LiveKit: - WSS signaling via `rtc.avaaz.ai` → Caddy → `livekit:7880`. - UDP audio and data channels via `rtc.avaaz.ai:50000–60000` directly to LiveKit on the VPS. - WebRTC handles ICE, STUN/TURN, jitter buffers, and Opus audio encoding. - **AI Agent Audio Path** - The agent logic inside the backend uses LiveKit Agent SDK to join rooms as a participant. - Agent → LiveKit: - WS signaling over the internal Docker network (`ws://livekit:7880`). - UDP audio transport as part of its WebRTC session. - Agent → LLM realtime API: - Secure WSS/WebRTC connection to OpenAI Realtime or Gemini Live. - The agent transcribes, processes, and generates audio responses, publishing them into the LiveKit room so the user hears natural speech. ### 1.7 CI/CD Pipeline Production CI/CD is handled by **Gitea Actions** running on the VPS. The `gitea-runner` container has access to the host Docker daemon and is responsible for both validation and deployment: - `.gitea/workflows/ci.yml` – **Continuous Integration** (branch/PR validation, no deployment). - `.gitea/workflows/cd.yml` – **Continuous Deployment** (tag-based releases to production). #### Build Phase (CI Workflow: `ci.yml`) **Triggers** - `push` to: - `feature/**` - `bugfix/**` - `pull_request` targeting `main`. **Runner & Environment** - Runs on the self-hosted runner labeled `linux_amd64`. - Checks out the relevant branch or PR commit from the `avaaz-app` repository into the runner’s workspace. **Steps** 1. **Checkout code** Uses `actions/checkout@v4` to fetch the branch or PR head commit. 2. **Report triggering context** Logs the event type (`push` or `pull_request`) and branches: - For `push`: the source branch (e.g., `feature/foo`). - For `pull_request`: source and target (`main`). 3. **Static analysis & tests** - Run linters, type checkers, and unit tests for backend and frontend. - Ensure the application code compiles/builds. 4. **Build Docker images for CI** - Build images (e.g., `frontend:ci` and `backend:ci`) to validate Dockerfiles and build chain. - These images are tagged for CI only and not used for production. 5. **Cleanup CI images** - Remove CI-tagged images at the end of the job (even on failure) to prevent disk usage from accumulating. **Outcome** - A green CI result on a branch/PR signals that: - The code compiles/builds. - Static checks and tests pass. - Docker images can be built successfully. - CI does **not** modify the production stack and does **not** depend on tags. #### Deploy Phase (CD Workflow: `cd.yml`) **Triggers** - Creation of a Git tag matching `v*` that points to a commit on the `main` branch in the `avaaz-app` repository. **Runner & Environment** - Runs on the same `linux_amd64` self-hosted runner. - Checks out the exact commit referenced by the tag. **Steps** 1. **Checkout tagged commit** - Uses `actions/checkout@v4` with `ref: ${{ gitea.ref }}` to check out the tagged commit. 2. **Tag validation** - Fetches `origin/main`. - Verifies that the tag commit is an ancestor of `origin/main` (i.e., the tag points to code that has been merged into `main`). - Fails the deployment if the commit is not in `main`’s history. 3. **Build & publish release** - Builds production Docker images for frontend, backend, LiveKit, etc., tagged with the version (e.g., `v0.1.0`). - Applies database migrations (e.g., via Alembic) if required. 4. **Restart production stack** - Restarts or recreates the app stack containers using the newly built/tagged images (e.g., via `docker compose -f docker-compose.yml up -d`). 5. **Health & readiness checks** - Probes key endpoints with `curl -f`, such as: - `https://app.avaaz.ai` - `https://api.avaaz.ai/health` - `wss://rtc.avaaz.ai` (signaling-level check) - If checks fail, marks the deployment as failed and automatically rolls back to previous images. **Outcome** - Only tagged releases whose commits are on the `main` branch are deployed. - Deployment is explicit (tag-based), separated from CI validation. ### 1.8 Typical Workflows #### User Login 1. Browser loads the frontend from `https://app.avaaz.ai`. 2. Frontend submits credentials to `POST https://api.avaaz.ai/auth/login`. 3. Backend validates credentials and returns: - A short-lived JWT **access token** - A long-lived opaque **refresh token** - A minimal user profile for immediate UI hydration 4. Frontend stores tokens appropriately (access token in memory; refresh token in secure storage or an httpOnly cookie). #### Load Persistent Session 1. Frontend calls `GET https://api.avaaz.ai/sessions/default`. 2. Backend retrieves or creates the user’s **persistent conversational session**, which encapsulates: - Long-running conversation state - Lesson and progress context - Historical summary for LLM context initialization 3. Backend prepares the session’s LLM context so that the agent can join with continuity. #### Join the Live Conversation Session 1. Frontend requests a LiveKit access token via `POST https://api.avaaz.ai/sessions/default/token`. 2. Backend generates a **new LiveKit token** (short-lived, room-scoped), containing: - Identity - Publish/subscribe permissions - Expiration (affecting initial join) - Room ID corresponding to the session 3. Frontend connects to the LiveKit server: - WSS for signaling - UDP/SCTP for low-latency audio and file transfer 4. If the user disconnects, the frontend requests a new LiveKit token before rejoining, ensuring seamless continuity. #### Conversation with AI Agent 1. Backend configures the session’s **AI agent** using: - Historical summary - Current lesson state - Language settings and mode (lesson, mock exam, free talk) 2. The agent joins the same LiveKit room as a participant. 3. All media flows through LiveKit: - User → audio → LiveKit → Agent - Agent → LLM realtime API → synthesized audio → LiveKit → User 4. The agent guides the user verbally: continuing lessons, revisiting material, running mock exams, or free conversation. The user experiences this as a **continuous, ongoing session** with seamless reconnection and state persistence. ### 1.9 Hardware | Class | Description | |----------------|-------------------------------------------| | system | Standard PC (i440FX + PIIX, 1996) | | bus | Motherboard | | memory | 96KiB BIOS | | processor | AMD EPYC-Milan Processor | | memory | 8GiB System Memory | | bridge | 440FX - 82441FX PMC [Natoma] | | bridge | 82371SB PIIX3 ISA [Natoma/Triton II] | | communication | PnP device PNP0501 | | input | PnP device PNP0303 | | input | PnP device PNP0f13 | | storage | PnP device PNP0700 | | system | PnP device PNP0b00 | | storage | 82371SB PIIX3 IDE [Natoma/Triton II] | | bus | 82371SB PIIX3 USB [Natoma/Triton II] | | bus | UHCI Host Controller | | input | QEMU USB Tablet | | bridge | 82371AB/EB/MB PIIX4 ACPI | | display | QXL paravirtual graphic card | | generic | Virtio RNG | | storage | Virtio block device | | disk | 257GB Virtual I/O device | | volume | 238GiB EXT4 volume | | volume | 4095KiB BIOS Boot partition | | volume | 105MiB Windows FAT volume | | volume | 913MiB EXT4 volume | | network | Virtio network device | | network | Ethernet interface | | input | Power Button | | input | AT Translated Set 2 keyboard | | input | VirtualPS/2 VMware VMMouse | ## 2. Development Laptop ### 2.1 Components #### App Stack (local Docker) - `frontend` (React/Next.js/Tailwind SPA) - `backend` (FastAPI) - `postgres` (PostgreSQL + pgvector) - `livekit` (local LiveKit Server) No Caddy is deployed locally; the browser talks directly to the mapped container ports on `localhost`. ### 2.2 Network - All services run as Docker containers on a shared Docker network. - Selected ports are published to `localhost` for direct access from the browser and local tools. - No public domains are used in development; everything is addressed via `http://localhost/...`. ### 2.3 Domains & IP Addresses Local development uses: - `http://localhost:3000` → frontend (React/Next.js/Tailwind dev/server container) - `http://localhost:8000` → backend API (FastAPI) - Example auth/session endpoints: - `POST http://localhost:8000/auth/login` - `GET http://localhost:8000/sessions/default` - `POST http://localhost:8000/sessions/default/token` - `ws://localhost:7880` → LiveKit signaling (local LiveKit server) - `udp://localhost:50000–60000` → LiveKit/WebRTC media No `/etc/hosts` changes or TLS certificates are required; `localhost` acts as a secure origin for WebRTC. ### 2.4 Ports & Protocols | Port | Protocol | Purpose | |-------------:|:--------:|------------------------------------| | 3000 | TCP | Frontend (React/Next.js/Tailwind) | | 8000 | TCP | Backend API (FastAPI) | | 5432 | TCP | Postgres + pgvector | | 7880 | TCP | LiveKit HTTP + WS signaling | | 50000–60000 | UDP | LiveKit WebRTC media (audio, data) | ### 2.5 Routing No local Caddy or reverse proxy layer is used; routing is direct via published ports. #### Internal Container Routing (Docker network) - Backend → Postgres: `postgres://postgres:5432` - Backend → LiveKit: `http://livekit:7880` - Frontend (server-side) → Backend: `http://backend:8000` #### Browser → Containers (via localhost) - Browser → Frontend: `http://localhost:3000` - Browser → Backend API: `http://localhost:8000` #### Outgoing (from Backend) - `backend` → `https://api.openai.com/v1/realtime/sessions` - `backend` → `wss://api.openai.com/v1/realtime?model=gpt-realtime` - `backend` → `wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent` These calls mirror production agent behavior while pointing to the same cloud LLM realtime endpoints. ### 2.6 Functional Layers #### Data Layer - Local Postgres instance mirrors the production schema (including pgvector). - Database migrations are applied via backend tooling (e.g., Alembic) to keep schema in sync. #### Control Layer - Backend runs full application logic locally: - Authentication and authorization - Lesson and progress APIs - LiveKit session management (`/sessions/default`, `/sessions/default/token`) and agent control - Frontend integrates against the same API surface as production, only with `localhost` URLs. #### Media Layer - Local LiveKit instance handles: - WS/HTTP signaling on port 7880 - WebRTC media (audio + data channels) on UDP `50000–60000` - Agent traffic mirrors production logic: - LiveKit ↔ Backend ↔ LLM realtime APIs (OpenAI / Gemini). ### 2.7 Typical Workflows #### Developer Pushes Code 1. Developer pushes to `git.avaaz.ai` over HTTPS/SSL or SSH. 2. CI runs automatically (linting, tests, build validation). No deployment occurs. 3. When a release is ready, the developer creates a version tag (`v*`) on a commit in `main`. 4. CD triggers: validates the tag, rebuilds from the tagged commit, deploys updated containers, then performs post-deploy health checks. #### App Development - Start the stack: `docker compose -f docker-compose.dev.yml up -d` - Open the app in the browser: `http://localhost:3000` - Frontend calls the local backend for: - `POST http://localhost:8000/auth/login` - `GET http://localhost:8000//sessions/default` - `POST http://localhost:8000//sessions/default/token` #### API Testing - Health check: `curl http://localhost:8000/health` - Auth and session testing: ```bash curl -X POST http://localhost:8000/auth/login \ -H "Content-Type: application/json" \ -d '{"email": "user@example.com", "password": "password"}' curl http://localhost:8000/sessions/default \ -H "Authorization: Bearer " ``` #### LiveKit Testing - Frontend connects to LiveKit via: - Signaling: `ws://localhost:7880` - WebRTC media: `udp://localhost:50000–60000` - Backend issues local LiveKit tokens via `POST http://localhost:8000//sessions/default/token`, then connects the AI agent to the local room. ### 2.8 Hardware | Class | Description | |----------------|--------------------------------------------| | system | HP Laptop 14-em0xxx | | bus | 8B27 motherboard bus | | memory | 128KiB BIOS | | processor | AMD Ryzen 3 7320U | | memory | 256KiB L1 cache | | memory | 2MiB L2 cache | | memory | 4MiB L3 cache | | memory | 8GiB System Memory | | bridge | Family 17h-19h PCIe Root Complex | | generic | Family 17h-19h IOMMU | | storage | SK hynix BC901 HFS256GE SSD | | disk | 256GB NVMe disk | | volume | 299MiB Windows FAT volume | | volume | 238GiB EXT4 volume | | network | RTL8852BE PCIe 802.11ax Wi-Fi | | display | Mendocino integrated graphics | | multimedia | Rembrandt Radeon High Definition Audio | | generic | Family 19h PSP/CCP | | bus | AMD xHCI Host Controller | | input | Logitech M705 Mouse | | input | Logitech K370s/K375s Keyboard | | multimedia | Jabra SPEAK 510 USB | | multimedia | Logitech Webcam C925e | | communication | Bluetooth Radio | | multimedia | HP True Vision HD Camera | | bus | FCH SMBus Controller | | bridge | FCH LPC Bridge | | power | AE03041 Battery | | input | Power Button | | input | Lid Switch | | input | HP WMI Hotkeys | | input | AT Translated Set 2 Keyboard | | input | Video Bus | | input | SYNA32D9:00 06CB:CE17 Mouse | | input | SYNA32D9:00 06CB:CE17 Touchpad | | network | Ethernet Interface |