Files
avaaz/docs/architecture.md
2025-11-24 00:30:36 +01:00

24 KiB
Raw Blame History

System Architecture

Below is a summary of the Production VPS and Development Laptop architectures. Both environments use Docker containers for consistency, with near-identical stacks where practical.

flowchart LR
%% Client
    A(Browser / PWA)
    Y(iOS App / Android App)
    
    subgraph User
        A
        Y
    end

%% LLM / Realtime
    B(OpenAI Realtime API)
    Z(Gemini Live API)

    subgraph Large Language Model
        B
        Z
    end

%% Server-side
    C(Caddy)
    I(Gitea + Actions + Repositories)
    J(Gitea Runner)

    D(Next.js Frontend)
    E(FastAPI Backend + Agent Runtime)
    G(LiveKit Server)
    H[(PostgreSQL + pgvector)]

%% Client ↔ VPS
    A <-- https://www.avaaz.ai --> C
    A <-- https://app.avaaz.ai --> C
    A & Y <-- https://api.avaaz.ai --> C
    A & Y <-- wss://rtc.avaaz.ai --> C
    A & Y <-- "udp://rtc.avaaz.ai:50000-60000 (WebRTC Media)" --> G

%% Caddy ↔ App
    C <-- "http://frontend:3000 (app)" --> D
    C <-- "http://backend:8000 (api)" --> E
    C <-- "ws://livekit:7880 (WebRTC signaling)" --> G
    C <-- "http://gitea:3000 (git)" --> I

%% App internal
    D <-- "http://backend:8000" --> E
    E <-- "postgresql://postgres:5432" --> H
    E <-- "http://livekit:7880 (control)" --> G
    E <-- "Agent joins via WebRTC" --> G

%% Agent ↔ LLM
    E <-- "WSS/WebRTC (realtime)" --> B
    E <-- "WSS (streaming)" --> Z

%% CI/CD
    I <-- "CI/CD triggers" --> J

    subgraph VPS
        subgraph Infra
            C
            I
            J
        end
        subgraph App
            D
            E
            G
            H
        end        
    end

%% Development Environment
    L(VS Code + Git + Docker)
    M(Local Docker Compose)
    N(Local Browser)
    O(Local Frontend)
    P(Local Backend)
    Q[(Local Postgres)]
    R(Local LiveKit)

    L <-- "https://git.avaaz.ai/...git" --> C
    L <-- "ssh://git@git.avaaz.ai:2222/..." --> I
    L -- "docker compose up" --> M

    M -- "Build & Run" --> O & P & Q & R

    N <-- HTTP --> O & P
    N <-- WebRTC --> R

    O <-- HTTP --> P
    P <-- SQL --> Q
    P <-- HTTP/WebRTC --> R
    P <-- WSS/WebRTC --> B
    P <-- WSS --> Z

    subgraph Development Laptop
        L
        M
        N
        subgraph Local App
            O
            P
            Q
            R
        end
    end

1. Production VPS

1.1 Components

Infra Stack

Docker Compose from the avaaz-infra Git repository is cloned to /srv/infra/docker-compose.yml on the VPS.

Container Description
caddy Caddy Reverse proxy with automatic HTTPS (TLS termination via Lets Encrypt).
gitea Gitea + Actions Git server using SQLite. Automated CI/CD workflows.
gitea-runner Gitea Runner Executes CI/CD jobs defined in Gitea Actions workflows.

App Stack

Docker Compose from the avaaz-app Git repository is cloned to /srv/app/docker-compose.yml on the VPS.

Container Description
frontend Next.js Frontend SPA/PWA interface served from a Node.js-based Next.js server.
backend FastAPI + Uvicorn Backend API, auth, business logic, LiveKit orchestration, agent.
postgres PostgreSQL + pgvector Persistent relational database with vector search.
livekit LiveKit Server WebRTC signaling plus UDP media for real-time audio and data.

The backend uses several Python packages such as UV, Ruff, FastAPI, FastAPI Users, FastAPI-pagination, FastStream, Pydantic, PydanticAI, Pydantic-settings, LiveKit Agent, Google Gemini Live API, OpenAI Realtime API, SQLAlchemy, Alembic, docling, Gunicorn, Uvicorn[standard], Pyright, Pytest, Hypothesis, and Httpx to deliver the services.

1.2 Network

  • All containers join a shared proxy Docker network.
  • Caddy can route to any service by container name.
  • App services communicate internally:
    • Frontend ↔ Backend
    • Backend ↔ Postgres
    • Backend ↔ LiveKit
    • Backend (agent) ↔ LiveKit & external LLM realtime APIs

1.3 Public DNS Records

Hostname Record Type Target Purpose
www.avaaz.ai CNAME avaaz.ai Marketing / landing site
avaaz.ai A 217.154.51.242 Root domain
app.avaaz.ai A 217.154.51.242 Next.js frontend (SPA/PWA)
api.avaaz.ai A 217.154.51.242 FastAPI backend
rtc.avaaz.ai A 217.154.51.242 LiveKit signaling + media
git.avaaz.ai A 217.154.51.242 Gitea (HTTPS + SSH)

1.4 Public Inbound Firewall Ports & Protocols

Port Protocol Purpose
80 TCP HTTP, ACME HTTP-01 challenge
443 TCP HTTPS, WSS (frontend, backend, LiveKit)
2222 TCP Git SSH via Gitea
2885 TCP VPS SSH access
3478 UDP STUN/TURN
5349 TCP TURN over TLS
7881 TCP LiveKit TCP fallback
5000060000 UDP LiveKit WebRTC media

1.5 Routing

Caddy

Caddy routes traffic from public ports 80 and 443 to internal services.

  • https://www.avaaz.aihttp://frontend:3000
  • https://app.avaaz.aihttp://frontend:3000
  • https://api.avaaz.aihttp://backend:8000
  • wss://rtc.avaaz.aiws://livekit:7880
  • https://git.avaaz.aihttp://gitea:3000

Internal Container Network

  • frontendhttp://backend:8000
  • backendpostgres://postgres:5432
  • backendhttp://livekit:7880 (control)
  • backendws://livekit:7880 (signaling)
  • backendudp://livekit:50000-60000 (media)
  • gitea-runner/var/run/docker.sock (Docker API on host)

Outgoing

  • backendhttps://api.openai.com/v1/realtime/sessions
  • backendwss://api.openai.com/v1/realtime?model=gpt-realtime
  • backendwss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent

1.6 Functional Layers

Data Layer

Infra:

  • SQLite (Gitea)

    • Gitea stores Git metadata (users, repos, issues, Actions metadata) in /data/gitea/gitea.db.
    • This is a file-backed SQLite database inside a persistent Docker volume.
    • Repository contents are stored under /data/git/, also volume-backed.
  • Gitea Runner State

    • Gitea Actions runner stores its registration information and job metadata under /data/.runner.

App:

  • PostgreSQL with pgvector

    • Primary relational database for users, lessons, transcripts, embeddings, and conversational context.
    • Hosted in the postgres container with a persistent Docker volume.
    • Managed via SQLAlchemy and Alembic migrations in the backend.
  • LiveKit Ephemeral State

    • Room metadata, participant states, and signaling information persist in memory within the livekit container.
    • LiveKits SFU media buffers and room state are not persisted across restarts.

Control Layer

Infra:

  • Caddy

    • TLS termination (Lets Encrypt).
    • Reverse proxy and routing for all public domains.
    • ACME certificate renewal.
  • Gitea

    • Git hosting, pull/clone over SSH and HTTPS.
    • CI/CD orchestration via Actions and internal APIs.
  • Gitea Runner

    • Executes workflows and controls the Docker engine via /var/run/docker.sock.

App:

  • FastAPI Backend

    • Authentication and authorization (/auth/login, /auth/refresh, /auth/me).
    • REST APIs for lessons, progress, documents, and file handling.
    • LiveKit session management (room mapping /sessions/default, token minting /sessions/default/token, agent configuration).
    • Calls out to OpenAI Realtime and Gemini Live APIs for AI-driven conversational behavior.
  • LiveKit Server

    • Manages room signaling, participant permissions, and session state.
    • Exposes HTTP control endpoint for room and participant management.

Media Layer

App:

  • User Audio Path

    • Browser/mobile → LiveKit:
      • WSS signaling via rtc.avaaz.ai → Caddy → livekit:7880.
      • UDP audio and data channels via rtc.avaaz.ai:5000060000 directly to LiveKit on the VPS.
    • WebRTC handles ICE, STUN/TURN, jitter buffers, and Opus audio encoding.
  • AI Agent Audio Path

    • The agent logic inside the backend uses LiveKit Agent SDK to join rooms as a participant.
    • Agent → LiveKit:
      • WS signaling over the internal Docker network (ws://livekit:7880).
      • UDP audio transport as part of its WebRTC session.
    • Agent → LLM realtime API:
      • Secure WSS/WebRTC connection to OpenAI Realtime or Gemini Live.
    • The agent transcribes, processes, and generates audio responses, publishing them into the LiveKit room so the user hears natural speech.

1.7 CI/CD Pipeline

Production CI/CD is handled by Gitea Actions running on the VPS. The gitea-runner container has access to the host Docker daemon and is responsible for both validation and deployment:

  • .gitea/workflows/ci.yml Continuous Integration (branch/PR validation, no deployment).
  • .gitea/workflows/cd.yml Continuous Deployment (tag-based releases to production).

Build Phase (CI Workflow: ci.yml)

Triggers

  • push to:
    • feature/**
    • bugfix/**
  • pull_request targeting main.

Runner & Environment

  • Runs on the self-hosted runner labeled linux_amd64.
  • Checks out the relevant branch or PR commit from the avaaz-app repository into the runners workspace.

Steps

  1. Checkout code
    Uses actions/checkout@v4 to fetch the branch or PR head commit.

  2. Report triggering context
    Logs the event type (push or pull_request) and branches:

    • For push: the source branch (e.g., feature/foo).
    • For pull_request: source and target (main).
  3. Static analysis & tests

    • Run linters, type checkers, and unit tests for backend and frontend.
    • Ensure the application code compiles/builds.
  4. Build Docker images for CI

    • Build images (e.g., frontend:ci and backend:ci) to validate Dockerfiles and build chain.
    • These images are tagged for CI only and not used for production.
  5. Cleanup CI images

    • Remove CI-tagged images at the end of the job (even on failure) to prevent disk usage from accumulating.

Outcome

  • A green CI result on a branch/PR signals that:
    • The code compiles/builds.
    • Static checks and tests pass.
    • Docker images can be built successfully.
  • CI does not modify the production stack and does not depend on tags.

Deploy Phase (CD Workflow: cd.yml)

Triggers

  • Creation of a Git tag matching v* that points to a commit on the main branch in the avaaz-app repository.

Runner & Environment

  • Runs on the same linux_amd64 self-hosted runner.
  • Checks out the exact commit referenced by the tag.

Steps

  1. Checkout tagged commit

    • Uses actions/checkout@v4 with ref: ${{ gitea.ref }} to check out the tagged commit.
  2. Tag validation

    • Fetches origin/main.
    • Verifies that the tag commit is an ancestor of origin/main (i.e., the tag points to code that has been merged into main).
    • Fails the deployment if the commit is not in mains history.
  3. Build & publish release

    • Builds production Docker images for frontend, backend, LiveKit, etc., tagged with the version (e.g., v0.1.0).
    • Applies database migrations (e.g., via Alembic) if required.
  4. Restart production stack

    • Restarts or recreates the app stack containers using the newly built/tagged images (e.g., via docker compose -f docker-compose.yml up -d).
  5. Health & readiness checks

    • Probes key endpoints with curl -f, such as:
      • https://app.avaaz.ai
      • https://api.avaaz.ai/health
      • wss://rtc.avaaz.ai (signaling-level check)
    • If checks fail, marks the deployment as failed and automatically rolls back to previous images.

Outcome

  • Only tagged releases whose commits are on the main branch are deployed.
  • Deployment is explicit (tag-based), separated from CI validation.

1.8 Typical Workflows

User Login

  1. Browser loads the frontend from https://app.avaaz.ai.
  2. Frontend submits credentials to POST https://api.avaaz.ai/auth/login.
  3. Backend validates credentials and returns:
    • A short-lived JWT access token
    • A long-lived opaque refresh token
    • A minimal user profile for immediate UI hydration
  4. Frontend stores tokens appropriately (access token in memory; refresh token in secure storage or an httpOnly cookie).

Load Persistent Session

  1. Frontend calls GET https://api.avaaz.ai/sessions/default.
  2. Backend retrieves or creates the users persistent conversational session, which encapsulates:
    • Long-running conversation state
    • Lesson and progress context
    • Historical summary for LLM context initialization
  3. Backend prepares the sessions LLM context so that the agent can join with continuity.

Join the Live Conversation Session

  1. Frontend requests a LiveKit access token via POST https://api.avaaz.ai/sessions/default/token.
  2. Backend generates a new LiveKit token (short-lived, room-scoped), containing:
    • Identity
    • Publish/subscribe permissions
    • Expiration (affecting initial join)
    • Room ID corresponding to the session
  3. Frontend connects to the LiveKit server:
    • WSS for signaling
    • UDP/SCTP for low-latency audio and file transfer
  4. If the user disconnects, the frontend requests a new LiveKit token before rejoining, ensuring seamless continuity.

Conversation with AI Agent

  1. Backend configures the sessions AI agent using:
    • Historical summary
    • Current lesson state
    • Language settings and mode (lesson, mock exam, free talk)
  2. The agent joins the same LiveKit room as a participant.
  3. All media flows through LiveKit:
    • User → audio → LiveKit → Agent
    • Agent → LLM realtime API → synthesized audio → LiveKit → User
  4. The agent guides the user verbally: continuing lessons, revisiting material, running mock exams, or free conversation.

The user experiences this as a continuous, ongoing session with seamless reconnection and state persistence.

1.9 Hardware

Class Description
system Standard PC (i440FX + PIIX, 1996)
bus Motherboard
memory 96KiB BIOS
processor AMD EPYC-Milan Processor
memory 8GiB System Memory
bridge 440FX - 82441FX PMC [Natoma]
bridge 82371SB PIIX3 ISA [Natoma/Triton II]
communication PnP device PNP0501
input PnP device PNP0303
input PnP device PNP0f13
storage PnP device PNP0700
system PnP device PNP0b00
storage 82371SB PIIX3 IDE [Natoma/Triton II]
bus 82371SB PIIX3 USB [Natoma/Triton II]
bus UHCI Host Controller
input QEMU USB Tablet
bridge 82371AB/EB/MB PIIX4 ACPI
display QXL paravirtual graphic card
generic Virtio RNG
storage Virtio block device
disk 257GB Virtual I/O device
volume 238GiB EXT4 volume
volume 4095KiB BIOS Boot partition
volume 105MiB Windows FAT volume
volume 913MiB EXT4 volume
network Virtio network device
network Ethernet interface
input Power Button
input AT Translated Set 2 keyboard
input VirtualPS/2 VMware VMMouse

2. Development Laptop

2.1 Components

App Stack (local Docker)

  • frontend (Next.js SPA)
  • backend (FastAPI)
  • postgres (PostgreSQL + pgvector)
  • livekit (local LiveKit Server)

No Caddy is deployed locally; the browser talks directly to the mapped container ports on localhost.

2.2 Network

  • All services run as Docker containers on a shared Docker network.
  • Selected ports are published to localhost for direct access from the browser and local tools.
  • No public domains are used in development; everything is addressed via http://localhost/....

2.3 Domains & IP Addresses

Local development uses:

  • http://localhost:3000 → frontend (Next.js dev/server container)
  • http://localhost:8000 → backend API (FastAPI)
    • Example auth/session endpoints:
      • POST http://localhost:8000/auth/login
      • GET http://localhost:8000/sessions/default
      • POST http://localhost:8000/sessions/default/token
  • ws://localhost:7880 → LiveKit signaling (local LiveKit server)
  • udp://localhost:5000060000 → LiveKit/WebRTC media

No /etc/hosts changes or TLS certificates are required; localhost acts as a secure origin for WebRTC.

2.4 Ports & Protocols

Port Protocol Purpose
3000 TCP Frontend (Next.js)
8000 TCP Backend API (FastAPI)
5432 TCP Postgres + pgvector
7880 TCP LiveKit HTTP + WS signaling
5000060000 UDP LiveKit WebRTC media (audio, data)

2.5 Routing

No local Caddy or reverse proxy layer is used; routing is direct via published ports.

Internal Container Routing (Docker network)

  • Backend → Postgres: postgres://postgres:5432
  • Backend → LiveKit: http://livekit:7880
  • Frontend (server-side) → Backend: http://backend:8000

Browser → Containers (via localhost)

  • Browser → Frontend: http://localhost:3000
  • Browser → Backend API: http://localhost:8000

Outgoing (from Backend)

  • backendhttps://api.openai.com/v1/realtime/sessions
  • backendwss://api.openai.com/v1/realtime?model=gpt-realtime
  • backendwss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent

These calls mirror production agent behavior while pointing to the same cloud LLM realtime endpoints.

2.6 Functional Layers

Data Layer

  • Local Postgres instance mirrors the production schema (including pgvector).
  • Database migrations are applied via backend tooling (e.g., Alembic) to keep schema in sync.

Control Layer

  • Backend runs full application logic locally:
    • Authentication and authorization
    • Lesson and progress APIs
    • LiveKit session management (/sessions/default, /sessions/default/token) and agent control
  • Frontend integrates against the same API surface as production, only with localhost URLs.

Media Layer

  • Local LiveKit instance handles:
    • WS/HTTP signaling on port 7880
    • WebRTC media (audio + data channels) on UDP 5000060000
  • Agent traffic mirrors production logic:
    • LiveKit ↔ Backend ↔ LLM realtime APIs (OpenAI / Gemini).

2.7 Typical Workflows

Developer Pushes Code

  1. Developer pushes to git.avaaz.ai over HTTPS/SSL or SSH.
  2. CI runs automatically (linting, tests, build validation). No deployment occurs.
  3. When a release is ready, the developer creates a version tag (v*) on a commit in main.
  4. CD triggers: validates the tag, rebuilds from the tagged commit, deploys updated containers, then performs post-deploy health checks.

App Development

  • Start the stack: docker compose -f docker-compose.dev.yml up -d
  • Open the app in the browser: http://localhost:3000
  • Frontend calls the local backend for:
    • POST http://localhost:8000/auth/login
    • GET http://localhost:8000//sessions/default
    • POST http://localhost:8000//sessions/default/token

API Testing

  • Health check: curl http://localhost:8000/health

  • Auth and session testing:

    curl -X POST http://localhost:8000/auth/login \
         -H "Content-Type: application/json" \
         -d '{"email": "user@example.com", "password": "password"}'
    
    curl http://localhost:8000/sessions/default \
         -H "Authorization: Bearer <access_token>"
    

LiveKit Testing

  • Frontend connects to LiveKit via:
    • Signaling: ws://localhost:7880
    • WebRTC media: udp://localhost:5000060000
  • Backend issues local LiveKit tokens via POST http://localhost:8000//sessions/default/token, then connects the AI agent to the local room.

2.8 Hardware

Class Description
system HP Laptop 14-em0xxx
bus 8B27 motherboard bus
memory 128KiB BIOS
processor AMD Ryzen 3 7320U
memory 256KiB L1 cache
memory 2MiB L2 cache
memory 4MiB L3 cache
memory 8GiB System Memory
bridge Family 17h-19h PCIe Root Complex
generic Family 17h-19h IOMMU
storage SK hynix BC901 HFS256GE SSD
disk 256GB NVMe disk
volume 299MiB Windows FAT volume
volume 238GiB EXT4 volume
network RTL8852BE PCIe 802.11ax Wi-Fi
display Mendocino integrated graphics
multimedia Rembrandt Radeon High Definition Audio
generic Family 19h PSP/CCP
bus AMD xHCI Host Controller
input Logitech M705 Mouse
input Logitech K370s/K375s Keyboard
multimedia Jabra SPEAK 510 USB
multimedia Logitech Webcam C925e
communication Bluetooth Radio
multimedia HP True Vision HD Camera
bus FCH SMBus Controller
bridge FCH LPC Bridge
power AE03041 Battery
input Power Button
input Lid Switch
input HP WMI Hotkeys
input AT Translated Set 2 Keyboard
input Video Bus
input SYNA32D9:00 06CB:CE17 Mouse
input SYNA32D9:00 06CB:CE17 Touchpad
network Ethernet Interface