Files
avaaz/docs/plan.md
Madava 20d7a66d57
All checks were successful
Continuous Integration / Validate and test changes (push) Successful in 3s
Update PRD, plan, and agent instructions
2025-12-04 08:26:58 +01:00

18 KiB
Raw Blame History

Avaaz Implementation Plan

This implementation plan translates the Product Requirements (docs/PRD.md), product description (README.md), and system architecture (docs/architecture.md) into concrete, phased engineering work for a production-grade Avaaz deployment.

The goal is to deliver an end-to-end, voice-first AI speaking coach that supports learners from A1B2, with B2 oral exam readiness as the primary outcome.


1. Guiding Principles

  • B2 exam readiness first, A1B2 capable: Design features and data models so that they support A1B2 learners, but prioritize workflows that move learners toward B2 oral exam success.
  • Voice-first, text-strong: Optimize for real-time speech-to-speech interactions, with robust transcripts and text UX as first-class companions.
  • Single source of truth: Keep curriculum, lessons, transcripts, and analytics centralized in PostgreSQL + pgvector; no separate vector store.
  • Continuous sessions: All conversations run within persistent sessions (/sessions/default), preserving state across reconnects.
  • Infrastructure parity: Development Docker stack mirrors production VPS stacks (infra/app), as described in docs/architecture.md.
  • Security and privacy: Apply strong auth, least-privilege access, safe logging, and clear retention policies for voice/transcript data.

2. High-Level Phasing

Phase 1 Foundation (M0M2)

  • Set up core infrastructure (Dockerized backend, frontend, LiveKit, Postgres+pgvector, Caddy).
  • Implement authentication, user model, and basic session handling.
  • Implement minimal voice conversation loop (user ↔ AI tutor) with basic transcripts.
  • Define initial CEFR-aware curriculum data model and seed a small set of lessons.

Phase 2 Learning Experience & Analytics (M3M4)

  • Implement full A1B2 curriculum representation, scenarios, and level-aware adaptive tutoring.
  • Add progress dashboard, gamification basics, and post-session summaries.
  • Implement AI-assisted lesson authoring and learner-upload-based lessons.
  • Introduce mock exam templates (A1A2, A2B1, B1B2) and B2-focused exam reports.

Phase 3 Scale, Reliability & Monetization (M5M6)

  • Harden infrastructure (observability, health checks, admin dashboards).
  • Add subscription plans and Stripe integration.
  • Optimize performance (latency, concurrency), tune analytics pipelines, and finalize launch-readiness tasks.

3. Backend Workstream (FastAPI + LiveKit + LLMs)

3.1 Core Service Setup

Goals

  • Production-ready FastAPI service with auth, sessions, and integrations.

Tasks

  • Use the existing backend layout under app/backend as the foundation:
    • app/backend/main.py app factory and router wiring.
    • app/backend/core/config.py Pydantic-settings for core configuration, DB_URL, LLM keys, LiveKit, Stripe, etc.
    • app/backend/core/database.py database/session utilities; extend to add SQLAlchemy, pgvector, and Alembic integration.
    • app/backend/api/v1/router.py versioned API router aggregator; include routers from feature and operation modules (existing features.auth, operations.health, plus future lessons, chat, documents).
    • app/backend/features/* and app/backend/operations/* domain logic and HTTP routers (e.g., auth, lessons, chat, payments, document upload, health).
  • Implement base middleware (CORS, logging, request ID, error handling).
  • Ensure /health, /health/live, and /health/ready endpoints are wired and return basic dependency checks (DB connectivity, LiveKit reachability, LLM connectivity where safe).

Deliverables

  • Running FastAPI service in Docker with /health OK and OpenAPI docs available.

3.2 Data Model & Persistence

Goals

  • Support A1B2 curriculum, lessons, sessions, transcripts, and analytics in PostgreSQL + pgvector.

Tasks

  • Design and implement SQLAlchemy models:
    • User profile, locale, target level, subscription plan, preferences.
    • CurriculumObjective per level (A1B2), skill (reception, production, interaction, mediation), descriptor text.
    • Lesson CEFR level, objectives, type (lesson, scenario, exam part), metadata (topic, context).
    • Scenario / ScenarioStep structured oral tasks (self-presentation, picture description, opinion exchange, arguing a statement) with configuration for timing and mode (individual/pair).
    • Session persistent conversational session per user (mode, state, summary, last_activity_at).
    • Turn individual utterances with role (user/AI), timestamps, raw transcript, audio reference, CEFR difficulty metadata.
    • ExamTemplate / ExamPart A1A2, A2B1, B1B2 templates with timing, task types, scoring dimensions.
    • ExamAttempt / ExamScore attempt metadata, estimated CEFR level, component scores.
    • UploadDocument / DocumentChunk files and parsed chunks with vector embeddings (stored alongside or extending the existing backend package under app/backend).
    • ProgressSnapshot aggregate metrics for dashboards (per user and optionally per program).
    • Subscription / PaymentEvent billing state and usage limits.
    • Note: Seed the database with the specific plans defined in README.md (First Light, Spark, Glow, Shine, Radiance) and their respective limits.
  • Add related Alembic migrations; verify they run cleanly on dev DB.

Deliverables

  • Migrations and models aligned with PRD feature set and architecture.

3.3 Authentication & User Management

Goals

  • Secure user auth using FastAPI Users (or equivalent) with JWT and refresh tokens.

Tasks

  • Configure FastAPI Users:
    • Email/password registration, login, password reset, email verification.
    • Role support (learner, instructor, admin) for curriculum authoring and admin dashboards.
  • Integrate auth into routes (dependencies.py with current_user).
  • Implement users endpoints for profile (target CEFR level, locale) and preferences (greeting verbosity, data retention preferences).

Deliverables

  • Auth flows working in backend and callable from Postman / curl.

3.4 Session Management & Transcripts

Goals

  • Provide continuous session behavior with persistent history, as described in docs/architecture.md and PRD.

Tasks

  • Implement GET /sessions/default:
    • Create or fetch Session for current user.
    • Load summary, current lesson state, and progress context.
  • Implement POST /sessions/default/token:
    • Generate short-lived LiveKit token with room identity tied to the session.
  • Integrate with LiveKit Agent:
    • Implement an LLM integration module (for example under app/backend/features/llm.py or similar) that configures the realtime session using historical summary, current goals, and mode (lesson/mock exam/free).
  • Implement transcript persistence:
    • Receive partial/final transcripts from LiveKit/agent.
    • Append Turn records and maintain rolling summaries for context.
    • Respect retention settings.
  • Implement post-session summarization endpoint / background job:
    • Generate per-session summary, strengths/weaknesses, recommended next steps.
  • Implement on-demand translation:
    • Endpoint (e.g., /chat/translate) or integrated socket message to translate user/AI text between target and native languages (supporting PRD Section 4.2).

Deliverables

  • API and background flows that maintain continuous conversational context per user.

3.5 Curriculum & Lesson APIs

Goals

  • Expose CEFR-aligned curriculum and lesson content to frontend and agent.

Tasks

  • Implement lessons router:
    • List lessons by level, topic, recommended next steps.
    • Fetch details for a specific lesson, including objectives and scenario steps.
    • Mark lesson progress and completion; update ProgressSnapshot.
  • Implement endpoints for curriculum objectives and mapping to lessons.
  • Implement endpoints to retrieve scenario templates for mock exams and regular lessons.

Deliverables

  • Stable JSON API for curriculum and lessons, used by frontend and agent system prompts.

3.6 AI-Assisted Authoring & User Uploads

Goals

  • Support instructor-designed and learner-generated lessons from uploaded materials.

Tasks

  • Implement documents router:
    • File upload endpoints for documents and images (instructor and learner scopes).
    • Trigger document processing pipeline (Docling or similar) to parse text and structure.
    • Chunk documents and store embeddings in DocumentChunk using pgvector.
  • Implement instructor authoring endpoints:
    • Create/update/delete lessons referencing uploaded documents/images.
    • AI-assisted suggestion endpoint that uses LLM to propose lesson structure, prompts, and exam-style tasks conditioned on level and objectives.
  • Implement learner upload endpoints:
    • User-specific upload and lesson creation (on-the-fly lessons).
    • Link created “ad-hoc” lessons to sessions so the tutor can reference them during practice.

Deliverables

  • Endpoints supporting both admin/instructor authoring and user-driven contextual lessons.

3.7 Mock Exam Engine & Scoring

Goals

  • Implement configurable mock oral exam flows for A1A2, A2B1, and B1B2, with B2 focus.

Tasks

  • Implement exam orchestration service:
    • Given an ExamTemplate, manage progression through ExamParts (including warm-up, individual tasks, pair tasks).
    • Enforce timing and mode flags to drive agent prompts.
  • Integrate scoring:
    • Use LLM to derive component scores (fluency, pronunciation, grammar, vocabulary, coherence) from transcripts.
    • Map to estimated CEFR band and store in ExamScore.
  • Expose endpoints:
    • Start exam, fetch exam status, retrieve past exam results.

Deliverables

  • End-to-end exam session that runs via the same LiveKit + agent infrastructure and stores exam results.

3.8 Analytics & Reporting

Goals

  • Provide learner-level dashboards and program-level reporting.

Tasks

  • Implement periodic aggregation (cron/async tasks) populating ProgressSnapshot.
  • Implement analytics endpoints:
    • Learner metrics (minutes spoken, session counts, trends per skill).
    • Program-level metrics (for instructors/coordinators) with appropriate role-based access.
  • Ensure privacy controls (anonymized or pseudonymized data where required).

Deliverables

  • Backend API supporting progress dashboards and reports as per PRD.

4. Frontend Workstream (Next.js + LiveKit)

4.1 Foundation & Layout

Goals

  • Production-ready Next.js PWA front-end that matches Avaaz branding and supports auth, routing, and basic pages.

Tasks

  • Initialize Next.js app (per README.md):
    • Configure next.config.js, TypeScript, ESLint, PWA manifest, and global styles.
    • Implement app/layout.tsx with theme, localization provider, and navigation.
    • Implement app/page.tsx landing page aligned with product positioning (A1B2, B2 focus).
  • Implement auth pages (login, register, email verification, forgot password).

Deliverables

  • Frontend skeleton running under Docker, reachable via Caddy in dev stack.

4.2 Chat & Voice Experience

Goals

  • Voice-first conversational UI integrated with LiveKit and backend sessions.

Tasks

  • Build ChatInterface.tsx:
    • Microphone controls, connection status, basic waveform/level visualization.
    • Rendering of AI and user turns with text and visual aids (images, tables) as provided by backend/agent.
    • Translation Support: UI controls to translate specific messages on demand (toggle or click-to-translate).
    • Error states for mic and network issues; text-only fallback UI.
  • Integrate with backend session APIs:
    • On login, call GET /sessions/default, then POST /sessions/default/token.
    • Connect to LiveKit using the token; handle reconnection logic.
  • Display contextual greeting and summary on session start using data returned from sessions API.

Deliverables

  • Usable chat interface capable of sustaining real-time conversation with the AI tutor.

4.3 Curriculum & Lesson UX

Goals

  • Allow learners to browse curriculum, start lessons, and view progress.

Tasks

  • Implement curriculum overview page:
    • Display modules and lessons grouped by CEFR levels (A1B2).
    • Indicate completion and recommended next lessons.
  • Implement lesson detail page:
    • Show lesson goals, target level, estimated time, and exam-related tags.
    • Start lesson → opens chat view in appropriate mode with lesson context.
  • Integrate progress indicators (streaks, minutes, CEFR band) into dashboard.

Deliverables

  • Navigation and views covering core learning flows described in PRD.

4.4 Mock Exam UX

Goals

  • Implement exam-specific UX consistent with oral exams and PRD.

Tasks

  • Build exam selection page:
    • Allow user to choose exam level (A1A2, A2B1, B1B2/B2-mock).
  • In-session exam UI:
    • Show current exam part, timer, and appropriate instructions.
    • Indicate whether current part is scored or warm-up.
  • Results page:
    • Show estimated CEFR level, component scores, and textual feedback.
    • Provide links to detailed transcripts, audio, and recommended follow-up lessons.

Deliverables

  • End-to-end exam flow from selection to results.

4.5 AI-Assisted Authoring & Upload UX

Goals

  • Provide UIs for instructors and learners to upload content and create lessons.

Tasks

  • Instructor interface:
    • Lesson builder UI with level, objectives, exam part, and document selection.
    • “Generate with AI” action to fetch suggested prompts/structure; edit-in-place and publish.
  • Learner interface:
    • Simple upload flow (document/image) to create ad-hoc practice.
    • Quick-start buttons to jump from uploaded content to a tailored lesson in the chat interface.

Deliverables

  • Authoring tools that map onto backend authoring APIs.

4.6 Analytics & Admin Health Dashboard

Goals

  • Provide admin and instructor dashboards for system health and learner analytics.

Tasks

  • Learner dashboard:
    • Visualize key metrics and streaks, integrated with backend analytics.
  • Instructor/program dashboard:
    • Aggregate usage and progress metrics for groups.
  • Admin health dashboard:
    • Surface backend /health status, LiveKit status, DB health indicators, and LLM connectivity signals.

Deliverables

  • Dashboards that satisfy PRDs analytics and health visibility requirements.

5. Real-Time & Media Workstream (LiveKit + Agents)

5.1 LiveKit Server & Config

Tasks

  • Use the existing livekit service in app/docker-compose.yml as the basis, keeping signaling on port 7880 and the WebRTC media port range configurable via environment variables (currently defaulting to 6000060100) and attached to the shared proxy network used by Caddy.
  • Ensure secure API keys and appropriate room/track settings for voice-only sessions.
  • Configure UDP ports and signaling endpoints (rtc.avaaz.ai → Caddy → livekit:7880) as described in docs/architecture.md and infra/Caddyfile.

5.2 Client Integration

Tasks

  • Wire frontend to LiveKit:
    • Use @livekit/client to join rooms using tokens from backend.
    • Handle reconnection and session resumption.
  • Integrate with backend session and agent orchestration.

5.3 Agent Integration with Realtime LLMs

Tasks

  • Implement LiveKit Agent that:
    • Connects to OpenAI Realtime or Gemini Live according to configuration.
    • Streams user audio and receives streamed AI audio and partial transcripts.
    • Forwards transcripts and metadata to backend for persistence.
  • Implement prompt templates for:
    • Regular lessons, mock exams, free conversation.
    • CEFR-level adaptation and exam-specific tasks.

Deliverables

  • Stable real-time pipeline from user microphone to LLM and back, integrated with backend logic.

6. Infrastructure & DevOps Workstream

6.1 Docker & Compose

Tasks

  • Define and refine:
    • infra/docker-compose.yml infra stack (Caddy, Gitea, Gitea runner).
    • app/docker-compose.yml app stack (frontend, backend, LiveKit, Postgres+pgvector).
  • Configure volumes and networks (proxy network for routing via Caddy).

6.2 CI & CD (Gitea + Actions)

Tasks

  • CI:
    • Extend .gitea/workflows/ci.yml to run linting, type-checking, and tests for backend and frontend once those projects are scaffolded under app/backend and app/frontend.
    • Add build verification for any Docker images produced for the app stack.
  • CD:
    • Use .gitea/workflows/cd.yml as the tag-based deploy workflow, following the deployment approach in docs/architecture.md.
    • Deploy tags v* only if they are on main.
    • Use /health and key endpoints for readiness checks; roll back on failures.

6.3 Observability & Monitoring

Tasks

  • Centralize logs and metrics for backend, frontend, LiveKit, and Postgres.
  • Configure alerting for:
    • Application errors.
    • Latency and uptime SLOs for voice and API endpoints.
    • Resource usage (CPU, memory, DB connections).

7. Quality, Security, and Compliance

7.1 Testing Strategy

Tasks

  • Backend:
    • Unit tests for core logic modules (e.g., health checks, config, LLM/document/payment integration) and any data models.
    • Integration tests for auth, sessions, and lessons using httpx + pytest.
  • Frontend:
    • Component tests for core UI (chat, curriculum, dashboards).
    • E2E flows for login, start lesson, start exam, and view progress.
  • Voice stack:
    • Automated sanity checks for LiveKit connectivity and audio round-trip.

7.2 Security & Privacy

Tasks

  • Apply OWASP-aligned input validation and output encoding.
  • Enforce HTTPS everywhere via Caddy; HSTS and secure cookies where applicable.
  • Implement appropriate retention and deletion policies for audio and transcripts.
  • Document data handling for learners and institutions (for future legal review).

8. Rollout Plan

8.1 Internal Alpha

  • Run app stack locally for core team.
  • Validate foundational flows: auth, voice session, basic lesson, transcripts.

8.2 Closed Beta

  • Onboard a small cohort of A2B2 learners and one or two programs.
  • Focus on curriculum fit, tutor behavior, and exam simulation realism.
  • Collect data to refine prompt templates, lesson design, and dashboard UX.

8.3 Public Launch

  • Enable subscription plans and payment.
  • Turn on production monitoring and on-call processes.
  • Iterate on performance, reliability, and content quality based on real usage.