# Avaaz Implementation Plan This implementation plan translates the Product Requirements (`docs/PRD.md`), product description (`README.md`), and system architecture (`docs/architecture.md`) into concrete, phased engineering work for a production-grade Avaaz deployment. The goal is to deliver an end-to-end, voice-first AI speaking coach that supports learners from A1–B2, with B2 oral exam readiness as the primary outcome. --- ## 1. Guiding Principles - **B2 exam readiness first, A1–B2 capable:** Design features and data models so that they support A1–B2 learners, but prioritize workflows that move learners toward B2 oral exam success. - **Voice-first, text-strong:** Optimize for real-time speech-to-speech interactions, with robust transcripts and text UX as first-class companions. - **Single source of truth:** Keep curriculum, lessons, transcripts, and analytics centralized in PostgreSQL + pgvector; no separate vector store. - **Continuous sessions:** All conversations run within persistent sessions (`/sessions/default`), preserving state across reconnects. - **Infrastructure parity:** Development Docker stack mirrors production VPS stacks (infra/app), as described in `docs/architecture.md`. - **Security and privacy:** Apply strong auth, least-privilege access, safe logging, and clear retention policies for voice/transcript data. --- ## 2. High-Level Phasing ### Phase 1 – Foundation (M0–M2) - Set up core infrastructure (Dockerized backend, frontend, LiveKit, Postgres+pgvector, Caddy). - Implement authentication, user model, and basic session handling. - Implement minimal voice conversation loop (user ↔ AI tutor) with basic transcripts. - Define initial CEFR-aware curriculum data model and seed a small set of lessons. ### Phase 2 – Learning Experience & Analytics (M3–M4) - Implement full A1–B2 curriculum representation, scenarios, and level-aware adaptive tutoring. - Add progress dashboard, gamification basics, and post-session summaries. - Implement AI-assisted lesson authoring and learner-upload-based lessons. - Introduce mock exam templates (A1–A2, A2–B1, B1–B2) and B2-focused exam reports. ### Phase 3 – Scale, Reliability & Monetization (M5–M6) - Harden infrastructure (observability, health checks, admin dashboards). - Add subscription plans and Stripe integration. - Optimize performance (latency, concurrency), tune analytics pipelines, and finalize launch-readiness tasks. --- ## 3. Backend Workstream (FastAPI + LiveKit + LLMs) ### 3.1 Core Service Setup **Goals** - Production-ready FastAPI service with auth, sessions, and integrations. **Tasks** - Use the existing backend layout under `app/backend` as the foundation: - `app/backend/main.py` – app factory and router wiring. - `app/backend/core/config.py` – Pydantic-settings for core configuration, DB_URL, LLM keys, LiveKit, Stripe, etc. - `app/backend/core/database.py` – database/session utilities; extend to add SQLAlchemy, pgvector, and Alembic integration. - `app/backend/api/v1/router.py` – versioned API router aggregator; include routers from feature and operation modules (existing `features.auth`, `operations.health`, plus future `lessons`, `chat`, `documents`). - `app/backend/features/*` and `app/backend/operations/*` – domain logic and HTTP routers (e.g., auth, lessons, chat, payments, document upload, health). - Implement base middleware (CORS, logging, request ID, error handling). - Ensure `/health`, `/health/live`, and `/health/ready` endpoints are wired and return basic dependency checks (DB connectivity, LiveKit reachability, LLM connectivity where safe). **Deliverables** - Running FastAPI service in Docker with `/health` OK and OpenAPI docs available. ### 3.2 Data Model & Persistence **Goals** - Support A1–B2 curriculum, lessons, sessions, transcripts, and analytics in PostgreSQL + pgvector. **Tasks** - Design and implement SQLAlchemy models: - `User` – profile, locale, target level, subscription plan, preferences. - `CurriculumObjective` – per level (A1–B2), skill (reception, production, interaction, mediation), descriptor text. - `Lesson` – CEFR level, objectives, type (lesson, scenario, exam part), metadata (topic, context). - `Scenario` / `ScenarioStep` – structured oral tasks (self-presentation, picture description, opinion exchange, arguing a statement) with configuration for timing and mode (individual/pair). - `Session` – persistent conversational session per user (mode, state, summary, last_activity_at). - `Turn` – individual utterances with role (user/AI), timestamps, raw transcript, audio reference, CEFR difficulty metadata. - `ExamTemplate` / `ExamPart` – A1–A2, A2–B1, B1–B2 templates with timing, task types, scoring dimensions. - `ExamAttempt` / `ExamScore` – attempt metadata, estimated CEFR level, component scores. - `UploadDocument` / `DocumentChunk` – files and parsed chunks with `vector` embeddings (stored alongside or extending the existing backend package under `app/backend`). - `ProgressSnapshot` – aggregate metrics for dashboards (per user and optionally per program). - `Subscription` / `PaymentEvent` – billing state and usage limits. - **Note:** Seed the database with the specific plans defined in `README.md` (First Light, Spark, Glow, Shine, Radiance) and their respective limits. - Add related Alembic migrations; verify they run cleanly on dev DB. **Deliverables** - Migrations and models aligned with PRD feature set and architecture. ### 3.3 Authentication & User Management **Goals** - Secure user auth using FastAPI Users (or equivalent) with JWT and refresh tokens. **Tasks** - Configure FastAPI Users: - Email/password registration, login, password reset, email verification. - Role support (learner, instructor, admin) for curriculum authoring and admin dashboards. - Integrate auth into routes (`dependencies.py` with `current_user`). - Implement `users` endpoints for profile (target CEFR level, locale) and preferences (greeting verbosity, data retention preferences). **Deliverables** - Auth flows working in backend and callable from Postman / curl. ### 3.4 Session Management & Transcripts **Goals** - Provide continuous session behavior with persistent history, as described in `docs/architecture.md` and PRD. **Tasks** - Implement `GET /sessions/default`: - Create or fetch `Session` for current user. - Load summary, current lesson state, and progress context. - Implement `POST /sessions/default/token`: - Generate short-lived LiveKit token with room identity tied to the session. - Integrate with LiveKit Agent: - Implement an LLM integration module (for example under `app/backend/features/llm.py` or similar) that configures the realtime session using historical summary, current goals, and mode (lesson/mock exam/free). - Implement transcript persistence: - Receive partial/final transcripts from LiveKit/agent. - Append `Turn` records and maintain rolling summaries for context. - Respect retention settings. - Implement post-session summarization endpoint / background job: - Generate per-session summary, strengths/weaknesses, recommended next steps. - Implement on-demand translation: - Endpoint (e.g., `/chat/translate`) or integrated socket message to translate user/AI text between target and native languages (supporting PRD Section 4.2). **Deliverables** - API and background flows that maintain continuous conversational context per user. ### 3.5 Curriculum & Lesson APIs **Goals** - Expose CEFR-aligned curriculum and lesson content to frontend and agent. **Tasks** - Implement `lessons` router: - List lessons by level, topic, recommended next steps. - Fetch details for a specific lesson, including objectives and scenario steps. - Mark lesson progress and completion; update `ProgressSnapshot`. - Implement endpoints for curriculum objectives and mapping to lessons. - Implement endpoints to retrieve scenario templates for mock exams and regular lessons. **Deliverables** - Stable JSON API for curriculum and lessons, used by frontend and agent system prompts. ### 3.6 AI-Assisted Authoring & User Uploads **Goals** - Support instructor-designed and learner-generated lessons from uploaded materials. **Tasks** - Implement `documents` router: - File upload endpoints for documents and images (instructor and learner scopes). - Trigger document processing pipeline (Docling or similar) to parse text and structure. - Chunk documents and store embeddings in `DocumentChunk` using pgvector. - Implement instructor authoring endpoints: - Create/update/delete lessons referencing uploaded documents/images. - AI-assisted suggestion endpoint that uses LLM to propose lesson structure, prompts, and exam-style tasks conditioned on level and objectives. - Implement learner upload endpoints: - User-specific upload and lesson creation (on-the-fly lessons). - Link created “ad-hoc” lessons to sessions so the tutor can reference them during practice. **Deliverables** - Endpoints supporting both admin/instructor authoring and user-driven contextual lessons. ### 3.7 Mock Exam Engine & Scoring **Goals** - Implement configurable mock oral exam flows for A1–A2, A2–B1, and B1–B2, with B2 focus. **Tasks** - Implement exam orchestration service: - Given an `ExamTemplate`, manage progression through `ExamPart`s (including warm-up, individual tasks, pair tasks). - Enforce timing and mode flags to drive agent prompts. - Integrate scoring: - Use LLM to derive component scores (fluency, pronunciation, grammar, vocabulary, coherence) from transcripts. - Map to estimated CEFR band and store in `ExamScore`. - Expose endpoints: - Start exam, fetch exam status, retrieve past exam results. **Deliverables** - End-to-end exam session that runs via the same LiveKit + agent infrastructure and stores exam results. ### 3.8 Analytics & Reporting **Goals** - Provide learner-level dashboards and program-level reporting. **Tasks** - Implement periodic aggregation (cron/async tasks) populating `ProgressSnapshot`. - Implement analytics endpoints: - Learner metrics (minutes spoken, session counts, trends per skill). - Program-level metrics (for instructors/coordinators) with appropriate role-based access. - Ensure privacy controls (anonymized or pseudonymized data where required). **Deliverables** - Backend API supporting progress dashboards and reports as per PRD. --- ## 4. Frontend Workstream (Next.js + LiveKit) ### 4.1 Foundation & Layout **Goals** - Production-ready Next.js PWA front-end that matches Avaaz branding and supports auth, routing, and basic pages. **Tasks** - Initialize Next.js app (per `README.md`): - Configure `next.config.js`, TypeScript, ESLint, PWA manifest, and global styles. - Implement `app/layout.tsx` with theme, localization provider, and navigation. - Implement `app/page.tsx` landing page aligned with product positioning (A1–B2, B2 focus). - Implement auth pages (login, register, email verification, forgot password). **Deliverables** - Frontend skeleton running under Docker, reachable via Caddy in dev stack. ### 4.2 Chat & Voice Experience **Goals** - Voice-first conversational UI integrated with LiveKit and backend sessions. **Tasks** - Build `ChatInterface.tsx`: - Microphone controls, connection status, basic waveform/level visualization. - Rendering of AI and user turns with text and visual aids (images, tables) as provided by backend/agent. - **Translation Support:** UI controls to translate specific messages on demand (toggle or click-to-translate). - Error states for mic and network issues; text-only fallback UI. - Integrate with backend session APIs: - On login, call `GET /sessions/default`, then `POST /sessions/default/token`. - Connect to LiveKit using the token; handle reconnection logic. - Display contextual greeting and summary on session start using data returned from `sessions` API. **Deliverables** - Usable chat interface capable of sustaining real-time conversation with the AI tutor. ### 4.3 Curriculum & Lesson UX **Goals** - Allow learners to browse curriculum, start lessons, and view progress. **Tasks** - Implement curriculum overview page: - Display modules and lessons grouped by CEFR levels (A1–B2). - Indicate completion and recommended next lessons. - Implement lesson detail page: - Show lesson goals, target level, estimated time, and exam-related tags. - Start lesson → opens chat view in appropriate mode with lesson context. - Integrate progress indicators (streaks, minutes, CEFR band) into dashboard. **Deliverables** - Navigation and views covering core learning flows described in PRD. ### 4.4 Mock Exam UX **Goals** - Implement exam-specific UX consistent with oral exams and PRD. **Tasks** - Build exam selection page: - Allow user to choose exam level (A1–A2, A2–B1, B1–B2/B2-mock). - In-session exam UI: - Show current exam part, timer, and appropriate instructions. - Indicate whether current part is scored or warm-up. - Results page: - Show estimated CEFR level, component scores, and textual feedback. - Provide links to detailed transcripts, audio, and recommended follow-up lessons. **Deliverables** - End-to-end exam flow from selection to results. ### 4.5 AI-Assisted Authoring & Upload UX **Goals** - Provide UIs for instructors and learners to upload content and create lessons. **Tasks** - Instructor interface: - Lesson builder UI with level, objectives, exam part, and document selection. - “Generate with AI” action to fetch suggested prompts/structure; edit-in-place and publish. - Learner interface: - Simple upload flow (document/image) to create ad-hoc practice. - Quick-start buttons to jump from uploaded content to a tailored lesson in the chat interface. **Deliverables** - Authoring tools that map onto backend authoring APIs. ### 4.6 Analytics & Admin Health Dashboard **Goals** - Provide admin and instructor dashboards for system health and learner analytics. **Tasks** - Learner dashboard: - Visualize key metrics and streaks, integrated with backend analytics. - Instructor/program dashboard: - Aggregate usage and progress metrics for groups. - Admin health dashboard: - Surface backend `/health` status, LiveKit status, DB health indicators, and LLM connectivity signals. **Deliverables** - Dashboards that satisfy PRD’s analytics and health visibility requirements. --- ## 5. Real-Time & Media Workstream (LiveKit + Agents) ### 5.1 LiveKit Server & Config **Tasks** - Use the existing `livekit` service in `app/docker-compose.yml` as the basis, keeping signaling on port 7880 and the WebRTC media port range configurable via environment variables (currently defaulting to 60000–60100) and attached to the shared `proxy` network used by Caddy. - Ensure secure API keys and appropriate room/track settings for voice-only sessions. - Configure UDP ports and signaling endpoints (`rtc.avaaz.ai` → Caddy → `livekit:7880`) as described in `docs/architecture.md` and `infra/Caddyfile`. ### 5.2 Client Integration **Tasks** - Wire frontend to LiveKit: - Use `@livekit/client` to join rooms using tokens from backend. - Handle reconnection and session resumption. - Integrate with backend session and agent orchestration. ### 5.3 Agent Integration with Realtime LLMs **Tasks** - Implement LiveKit Agent that: - Connects to OpenAI Realtime or Gemini Live according to configuration. - Streams user audio and receives streamed AI audio and partial transcripts. - Forwards transcripts and metadata to backend for persistence. - Implement prompt templates for: - Regular lessons, mock exams, free conversation. - CEFR-level adaptation and exam-specific tasks. **Deliverables** - Stable real-time pipeline from user microphone to LLM and back, integrated with backend logic. --- ## 6. Infrastructure & DevOps Workstream ### 6.1 Docker & Compose **Tasks** - Define and refine: - `infra/docker-compose.yml` – infra stack (Caddy, Gitea, Gitea runner). - `app/docker-compose.yml` – app stack (frontend, backend, LiveKit, Postgres+pgvector). - Configure volumes and networks (`proxy` network for routing via Caddy). ### 6.2 CI & CD (Gitea + Actions) **Tasks** - CI: - Extend `.gitea/workflows/ci.yml` to run linting, type-checking, and tests for backend and frontend once those projects are scaffolded under `app/backend` and `app/frontend`. - Add build verification for any Docker images produced for the app stack. - CD: - Use `.gitea/workflows/cd.yml` as the tag-based deploy workflow, following the deployment approach in `docs/architecture.md`. - Deploy tags `v*` only if they are on `main`. - Use `/health` and key endpoints for readiness checks; roll back on failures. ### 6.3 Observability & Monitoring **Tasks** - Centralize logs and metrics for backend, frontend, LiveKit, and Postgres. - Configure alerting for: - Application errors. - Latency and uptime SLOs for voice and API endpoints. - Resource usage (CPU, memory, DB connections). --- ## 7. Quality, Security, and Compliance ### 7.1 Testing Strategy **Tasks** - Backend: - Unit tests for core logic modules (e.g., health checks, config, LLM/document/payment integration) and any data models. - Integration tests for auth, sessions, and lessons using httpx + pytest. - Frontend: - Component tests for core UI (chat, curriculum, dashboards). - E2E flows for login, start lesson, start exam, and view progress. - Voice stack: - Automated sanity checks for LiveKit connectivity and audio round-trip. ### 7.2 Security & Privacy **Tasks** - Apply OWASP-aligned input validation and output encoding. - Enforce HTTPS everywhere via Caddy; HSTS and secure cookies where applicable. - Implement appropriate retention and deletion policies for audio and transcripts. - Document data handling for learners and institutions (for future legal review). --- ## 8. Rollout Plan ### 8.1 Internal Alpha - Run app stack locally for core team. - Validate foundational flows: auth, voice session, basic lesson, transcripts. ### 8.2 Closed Beta - Onboard a small cohort of A2–B2 learners and one or two programs. - Focus on curriculum fit, tutor behavior, and exam simulation realism. - Collect data to refine prompt templates, lesson design, and dashboard UX. ### 8.3 Public Launch - Enable subscription plans and payment. - Turn on production monitoring and on-call processes. - Iterate on performance, reliability, and content quality based on real usage.