All checks were successful
Continuous Integration / Validate and test changes (push) Successful in 3s
458 lines
18 KiB
Markdown
458 lines
18 KiB
Markdown
# Avaaz Implementation Plan
|
||
|
||
This implementation plan translates the Product Requirements (`docs/PRD.md`), product description (`README.md`), and system architecture (`docs/architecture.md`) into concrete, phased engineering work for a production-grade Avaaz deployment.
|
||
|
||
The goal is to deliver an end-to-end, voice-first AI speaking coach that supports learners from A1–B2, with B2 oral exam readiness as the primary outcome.
|
||
|
||
---
|
||
|
||
## 1. Guiding Principles
|
||
|
||
- **B2 exam readiness first, A1–B2 capable:** Design features and data models so that they support A1–B2 learners, but prioritize workflows that move learners toward B2 oral exam success.
|
||
- **Voice-first, text-strong:** Optimize for real-time speech-to-speech interactions, with robust transcripts and text UX as first-class companions.
|
||
- **Single source of truth:** Keep curriculum, lessons, transcripts, and analytics centralized in PostgreSQL + pgvector; no separate vector store.
|
||
- **Continuous sessions:** All conversations run within persistent sessions (`/sessions/default`), preserving state across reconnects.
|
||
- **Infrastructure parity:** Development Docker stack mirrors production VPS stacks (infra/app), as described in `docs/architecture.md`.
|
||
- **Security and privacy:** Apply strong auth, least-privilege access, safe logging, and clear retention policies for voice/transcript data.
|
||
|
||
---
|
||
|
||
## 2. High-Level Phasing
|
||
|
||
### Phase 1 – Foundation (M0–M2)
|
||
|
||
- Set up core infrastructure (Dockerized backend, frontend, LiveKit, Postgres+pgvector, Caddy).
|
||
- Implement authentication, user model, and basic session handling.
|
||
- Implement minimal voice conversation loop (user ↔ AI tutor) with basic transcripts.
|
||
- Define initial CEFR-aware curriculum data model and seed a small set of lessons.
|
||
|
||
### Phase 2 – Learning Experience & Analytics (M3–M4)
|
||
|
||
- Implement full A1–B2 curriculum representation, scenarios, and level-aware adaptive tutoring.
|
||
- Add progress dashboard, gamification basics, and post-session summaries.
|
||
- Implement AI-assisted lesson authoring and learner-upload-based lessons.
|
||
- Introduce mock exam templates (A1–A2, A2–B1, B1–B2) and B2-focused exam reports.
|
||
|
||
### Phase 3 – Scale, Reliability & Monetization (M5–M6)
|
||
|
||
- Harden infrastructure (observability, health checks, admin dashboards).
|
||
- Add subscription plans and Stripe integration.
|
||
- Optimize performance (latency, concurrency), tune analytics pipelines, and finalize launch-readiness tasks.
|
||
|
||
---
|
||
|
||
## 3. Backend Workstream (FastAPI + LiveKit + LLMs)
|
||
|
||
### 3.1 Core Service Setup
|
||
|
||
**Goals**
|
||
|
||
- Production-ready FastAPI service with auth, sessions, and integrations.
|
||
|
||
**Tasks**
|
||
|
||
- Use the existing backend layout under `app/backend` as the foundation:
|
||
- `app/backend/main.py` – app factory and router wiring.
|
||
- `app/backend/core/config.py` – Pydantic-settings for core configuration, DB_URL, LLM keys, LiveKit, Stripe, etc.
|
||
- `app/backend/core/database.py` – database/session utilities; extend to add SQLAlchemy, pgvector, and Alembic integration.
|
||
- `app/backend/api/v1/router.py` – versioned API router aggregator; include routers from feature and operation modules (existing `features.auth`, `operations.health`, plus future `lessons`, `chat`, `documents`).
|
||
- `app/backend/features/*` and `app/backend/operations/*` – domain logic and HTTP routers (e.g., auth, lessons, chat, payments, document upload, health).
|
||
- Implement base middleware (CORS, logging, request ID, error handling).
|
||
- Ensure `/health`, `/health/live`, and `/health/ready` endpoints are wired and return basic dependency checks (DB connectivity, LiveKit reachability, LLM connectivity where safe).
|
||
|
||
**Deliverables**
|
||
|
||
- Running FastAPI service in Docker with `/health` OK and OpenAPI docs available.
|
||
|
||
### 3.2 Data Model & Persistence
|
||
|
||
**Goals**
|
||
|
||
- Support A1–B2 curriculum, lessons, sessions, transcripts, and analytics in PostgreSQL + pgvector.
|
||
|
||
**Tasks**
|
||
|
||
- Design and implement SQLAlchemy models:
|
||
- `User` – profile, locale, target level, subscription plan, preferences.
|
||
- `CurriculumObjective` – per level (A1–B2), skill (reception, production, interaction, mediation), descriptor text.
|
||
- `Lesson` – CEFR level, objectives, type (lesson, scenario, exam part), metadata (topic, context).
|
||
- `Scenario` / `ScenarioStep` – structured oral tasks (self-presentation, picture description, opinion exchange, arguing a statement) with configuration for timing and mode (individual/pair).
|
||
- `Session` – persistent conversational session per user (mode, state, summary, last_activity_at).
|
||
- `Turn` – individual utterances with role (user/AI), timestamps, raw transcript, audio reference, CEFR difficulty metadata.
|
||
- `ExamTemplate` / `ExamPart` – A1–A2, A2–B1, B1–B2 templates with timing, task types, scoring dimensions.
|
||
- `ExamAttempt` / `ExamScore` – attempt metadata, estimated CEFR level, component scores.
|
||
- `UploadDocument` / `DocumentChunk` – files and parsed chunks with `vector` embeddings (stored alongside or extending the existing backend package under `app/backend`).
|
||
- `ProgressSnapshot` – aggregate metrics for dashboards (per user and optionally per program).
|
||
- `Subscription` / `PaymentEvent` – billing state and usage limits.
|
||
- Add related Alembic migrations; verify they run cleanly on dev DB.
|
||
|
||
**Deliverables**
|
||
|
||
- Migrations and models aligned with PRD feature set and architecture.
|
||
|
||
### 3.3 Authentication & User Management
|
||
|
||
**Goals**
|
||
|
||
- Secure user auth using FastAPI Users (or equivalent) with JWT and refresh tokens.
|
||
|
||
**Tasks**
|
||
|
||
- Configure FastAPI Users:
|
||
- Email/password registration, login, password reset, email verification.
|
||
- Role support (learner, instructor, admin) for curriculum authoring and admin dashboards.
|
||
- Integrate auth into routes (`dependencies.py` with `current_user`).
|
||
- Implement `users` endpoints for profile (target CEFR level, locale) and preferences (greeting verbosity, data retention preferences).
|
||
|
||
**Deliverables**
|
||
|
||
- Auth flows working in backend and callable from Postman / curl.
|
||
|
||
### 3.4 Session Management & Transcripts
|
||
|
||
**Goals**
|
||
|
||
- Provide continuous session behavior with persistent history, as described in `docs/architecture.md` and PRD.
|
||
|
||
**Tasks**
|
||
|
||
- Implement `GET /sessions/default`:
|
||
- Create or fetch `Session` for current user.
|
||
- Load summary, current lesson state, and progress context.
|
||
- Implement `POST /sessions/default/token`:
|
||
- Generate short-lived LiveKit token with room identity tied to the session.
|
||
- Integrate with LiveKit Agent:
|
||
- Implement an LLM integration module (for example under `app/backend/features/llm.py` or similar) that configures the realtime session using historical summary, current goals, and mode (lesson/mock exam/free).
|
||
- Implement transcript persistence:
|
||
- Receive partial/final transcripts from LiveKit/agent.
|
||
- Append `Turn` records and maintain rolling summaries for context.
|
||
- Respect retention settings.
|
||
- Implement post-session summarization endpoint / background job:
|
||
- Generate per-session summary, strengths/weaknesses, recommended next steps.
|
||
|
||
**Deliverables**
|
||
|
||
- API and background flows that maintain continuous conversational context per user.
|
||
|
||
### 3.5 Curriculum & Lesson APIs
|
||
|
||
**Goals**
|
||
|
||
- Expose CEFR-aligned curriculum and lesson content to frontend and agent.
|
||
|
||
**Tasks**
|
||
|
||
- Implement `lessons` router:
|
||
- List lessons by level, topic, recommended next steps.
|
||
- Fetch details for a specific lesson, including objectives and scenario steps.
|
||
- Mark lesson progress and completion; update `ProgressSnapshot`.
|
||
- Implement endpoints for curriculum objectives and mapping to lessons.
|
||
- Implement endpoints to retrieve scenario templates for mock exams and regular lessons.
|
||
|
||
**Deliverables**
|
||
|
||
- Stable JSON API for curriculum and lessons, used by frontend and agent system prompts.
|
||
|
||
### 3.6 AI-Assisted Authoring & User Uploads
|
||
|
||
**Goals**
|
||
|
||
- Support instructor-designed and learner-generated lessons from uploaded materials.
|
||
|
||
**Tasks**
|
||
|
||
- Implement `documents` router:
|
||
- File upload endpoints for documents and images (instructor and learner scopes).
|
||
- Trigger document processing pipeline (Docling or similar) to parse text and structure.
|
||
- Chunk documents and store embeddings in `DocumentChunk` using pgvector.
|
||
- Implement instructor authoring endpoints:
|
||
- Create/update/delete lessons referencing uploaded documents/images.
|
||
- AI-assisted suggestion endpoint that uses LLM to propose lesson structure, prompts, and exam-style tasks conditioned on level and objectives.
|
||
- Implement learner upload endpoints:
|
||
- User-specific upload and lesson creation (on-the-fly lessons).
|
||
- Link created “ad-hoc” lessons to sessions so the tutor can reference them during practice.
|
||
|
||
**Deliverables**
|
||
|
||
- Endpoints supporting both admin/instructor authoring and user-driven contextual lessons.
|
||
|
||
### 3.7 Mock Exam Engine & Scoring
|
||
|
||
**Goals**
|
||
|
||
- Implement configurable mock oral exam flows for A1–A2, A2–B1, and B1–B2, with B2 focus.
|
||
|
||
**Tasks**
|
||
|
||
- Implement exam orchestration service:
|
||
- Given an `ExamTemplate`, manage progression through `ExamPart`s (including warm-up, individual tasks, pair tasks).
|
||
- Enforce timing and mode flags to drive agent prompts.
|
||
- Integrate scoring:
|
||
- Use LLM to derive component scores (fluency, pronunciation, grammar, vocabulary, coherence) from transcripts.
|
||
- Map to estimated CEFR band and store in `ExamScore`.
|
||
- Expose endpoints:
|
||
- Start exam, fetch exam status, retrieve past exam results.
|
||
|
||
**Deliverables**
|
||
|
||
- End-to-end exam session that runs via the same LiveKit + agent infrastructure and stores exam results.
|
||
|
||
### 3.8 Analytics & Reporting
|
||
|
||
**Goals**
|
||
|
||
- Provide learner-level dashboards and program-level reporting.
|
||
|
||
**Tasks**
|
||
|
||
- Implement periodic aggregation (cron/async tasks) populating `ProgressSnapshot`.
|
||
- Implement analytics endpoints:
|
||
- Learner metrics (minutes spoken, session counts, trends per skill).
|
||
- Program-level metrics (for instructors/coordinators) with appropriate role-based access.
|
||
- Ensure privacy controls (anonymized or pseudonymized data where required).
|
||
|
||
**Deliverables**
|
||
|
||
- Backend API supporting progress dashboards and reports as per PRD.
|
||
|
||
---
|
||
|
||
## 4. Frontend Workstream (Next.js + LiveKit)
|
||
|
||
### 4.1 Foundation & Layout
|
||
|
||
**Goals**
|
||
|
||
- Production-ready Next.js PWA front-end that matches Avaaz branding and supports auth, routing, and basic pages.
|
||
|
||
**Tasks**
|
||
|
||
- Initialize Next.js app (per `README.md`):
|
||
- Configure `next.config.js`, TypeScript, ESLint, PWA manifest, and global styles.
|
||
- Implement `app/layout.tsx` with theme, localization provider, and navigation.
|
||
- Implement `app/page.tsx` landing page aligned with product positioning (A1–B2, B2 focus).
|
||
- Implement auth pages (login, register, email verification, forgot password).
|
||
|
||
**Deliverables**
|
||
|
||
- Frontend skeleton running under Docker, reachable via Caddy in dev stack.
|
||
|
||
### 4.2 Chat & Voice Experience
|
||
|
||
**Goals**
|
||
|
||
- Voice-first conversational UI integrated with LiveKit and backend sessions.
|
||
|
||
**Tasks**
|
||
|
||
- Build `ChatInterface.tsx`:
|
||
- Microphone controls, connection status, basic waveform/level visualization.
|
||
- Rendering of AI and user turns with text and visual aids (images, tables) as provided by backend/agent.
|
||
- Error states for mic and network issues; text-only fallback UI.
|
||
- Integrate with backend session APIs:
|
||
- On login, call `GET /sessions/default`, then `POST /sessions/default/token`.
|
||
- Connect to LiveKit using the token; handle reconnection logic.
|
||
- Display contextual greeting and summary on session start using data returned from `sessions` API.
|
||
|
||
**Deliverables**
|
||
|
||
- Usable chat interface capable of sustaining real-time conversation with the AI tutor.
|
||
|
||
### 4.3 Curriculum & Lesson UX
|
||
|
||
**Goals**
|
||
|
||
- Allow learners to browse curriculum, start lessons, and view progress.
|
||
|
||
**Tasks**
|
||
|
||
- Implement curriculum overview page:
|
||
- Display modules and lessons grouped by CEFR levels (A1–B2).
|
||
- Indicate completion and recommended next lessons.
|
||
- Implement lesson detail page:
|
||
- Show lesson goals, target level, estimated time, and exam-related tags.
|
||
- Start lesson → opens chat view in appropriate mode with lesson context.
|
||
- Integrate progress indicators (streaks, minutes, CEFR band) into dashboard.
|
||
|
||
**Deliverables**
|
||
|
||
- Navigation and views covering core learning flows described in PRD.
|
||
|
||
### 4.4 Mock Exam UX
|
||
|
||
**Goals**
|
||
|
||
- Implement exam-specific UX consistent with oral exams and PRD.
|
||
|
||
**Tasks**
|
||
|
||
- Build exam selection page:
|
||
- Allow user to choose exam level (A1–A2, A2–B1, B1–B2/B2-mock).
|
||
- In-session exam UI:
|
||
- Show current exam part, timer, and appropriate instructions.
|
||
- Indicate whether current part is scored or warm-up.
|
||
- Results page:
|
||
- Show estimated CEFR level, component scores, and textual feedback.
|
||
- Provide links to detailed transcripts, audio, and recommended follow-up lessons.
|
||
|
||
**Deliverables**
|
||
|
||
- End-to-end exam flow from selection to results.
|
||
|
||
### 4.5 AI-Assisted Authoring & Upload UX
|
||
|
||
**Goals**
|
||
|
||
- Provide UIs for instructors and learners to upload content and create lessons.
|
||
|
||
**Tasks**
|
||
|
||
- Instructor interface:
|
||
- Lesson builder UI with level, objectives, exam part, and document selection.
|
||
- “Generate with AI” action to fetch suggested prompts/structure; edit-in-place and publish.
|
||
- Learner interface:
|
||
- Simple upload flow (document/image) to create ad-hoc practice.
|
||
- Quick-start buttons to jump from uploaded content to a tailored lesson in the chat interface.
|
||
|
||
**Deliverables**
|
||
|
||
- Authoring tools that map onto backend authoring APIs.
|
||
|
||
### 4.6 Analytics & Admin Health Dashboard
|
||
|
||
**Goals**
|
||
|
||
- Provide admin and instructor dashboards for system health and learner analytics.
|
||
|
||
**Tasks**
|
||
|
||
- Learner dashboard:
|
||
- Visualize key metrics and streaks, integrated with backend analytics.
|
||
- Instructor/program dashboard:
|
||
- Aggregate usage and progress metrics for groups.
|
||
- Admin health dashboard:
|
||
- Surface backend `/health` status, LiveKit status, DB health indicators, and LLM connectivity signals.
|
||
|
||
**Deliverables**
|
||
|
||
- Dashboards that satisfy PRD’s analytics and health visibility requirements.
|
||
|
||
---
|
||
|
||
## 5. Real-Time & Media Workstream (LiveKit + Agents)
|
||
|
||
### 5.1 LiveKit Server & Config
|
||
|
||
**Tasks**
|
||
|
||
- Use the existing `livekit` service in `app/docker-compose.yml` as the basis, keeping signaling on port 7880 and the WebRTC media port range configurable via environment variables (currently defaulting to 60000–60100) and attached to the shared `proxy` network used by Caddy.
|
||
- Ensure secure API keys and appropriate room/track settings for voice-only sessions.
|
||
- Configure UDP ports and signaling endpoints (`rtc.avaaz.ai` → Caddy → `livekit:7880`) as described in `docs/architecture.md` and `infra/Caddyfile`.
|
||
|
||
### 5.2 Client Integration
|
||
|
||
**Tasks**
|
||
|
||
- Wire frontend to LiveKit:
|
||
- Use `@livekit/client` to join rooms using tokens from backend.
|
||
- Handle reconnection and session resumption.
|
||
- Integrate with backend session and agent orchestration.
|
||
|
||
### 5.3 Agent Integration with Realtime LLMs
|
||
|
||
**Tasks**
|
||
|
||
- Implement LiveKit Agent that:
|
||
- Connects to OpenAI Realtime or Gemini Live according to configuration.
|
||
- Streams user audio and receives streamed AI audio and partial transcripts.
|
||
- Forwards transcripts and metadata to backend for persistence.
|
||
- Implement prompt templates for:
|
||
- Regular lessons, mock exams, free conversation.
|
||
- CEFR-level adaptation and exam-specific tasks.
|
||
|
||
**Deliverables**
|
||
|
||
- Stable real-time pipeline from user microphone to LLM and back, integrated with backend logic.
|
||
|
||
---
|
||
|
||
## 6. Infrastructure & DevOps Workstream
|
||
|
||
### 6.1 Docker & Compose
|
||
|
||
**Tasks**
|
||
|
||
- Define and refine:
|
||
- `infra/docker-compose.yml` – infra stack (Caddy, Gitea, Gitea runner).
|
||
- `app/docker-compose.yml` – app stack (frontend, backend, LiveKit, Postgres+pgvector).
|
||
- Configure volumes and networks (`proxy` network for routing via Caddy).
|
||
|
||
### 6.2 CI & CD (Gitea + Actions)
|
||
|
||
**Tasks**
|
||
|
||
- CI:
|
||
- Extend `.gitea/workflows/ci.yml` to run linting, type-checking, and tests for backend and frontend once those projects are scaffolded under `app/backend` and `app/frontend`.
|
||
- Add build verification for any Docker images produced for the app stack.
|
||
- CD:
|
||
- Use `.gitea/workflows/cd.yml` as the tag-based deploy workflow, following the deployment approach in `docs/architecture.md`.
|
||
- Deploy tags `v*` only if they are on `main`.
|
||
- Use `/health` and key endpoints for readiness checks; roll back on failures.
|
||
|
||
### 6.3 Observability & Monitoring
|
||
|
||
**Tasks**
|
||
|
||
- Centralize logs and metrics for backend, frontend, LiveKit, and Postgres.
|
||
- Configure alerting for:
|
||
- Application errors.
|
||
- Latency and uptime SLOs for voice and API endpoints.
|
||
- Resource usage (CPU, memory, DB connections).
|
||
|
||
---
|
||
|
||
## 7. Quality, Security, and Compliance
|
||
|
||
### 7.1 Testing Strategy
|
||
|
||
**Tasks**
|
||
|
||
- Backend:
|
||
- Unit tests for core logic modules (e.g., health checks, config, LLM/document/payment integration) and any data models.
|
||
- Integration tests for auth, sessions, and lessons using httpx + pytest.
|
||
- Frontend:
|
||
- Component tests for core UI (chat, curriculum, dashboards).
|
||
- E2E flows for login, start lesson, start exam, and view progress.
|
||
- Voice stack:
|
||
- Automated sanity checks for LiveKit connectivity and audio round-trip.
|
||
|
||
### 7.2 Security & Privacy
|
||
|
||
**Tasks**
|
||
|
||
- Apply OWASP-aligned input validation and output encoding.
|
||
- Enforce HTTPS everywhere via Caddy; HSTS and secure cookies where applicable.
|
||
- Implement appropriate retention and deletion policies for audio and transcripts.
|
||
- Document data handling for learners and institutions (for future legal review).
|
||
|
||
---
|
||
|
||
## 8. Rollout Plan
|
||
|
||
### 8.1 Internal Alpha
|
||
|
||
- Run app stack locally for core team.
|
||
- Validate foundational flows: auth, voice session, basic lesson, transcripts.
|
||
|
||
### 8.2 Closed Beta
|
||
|
||
- Onboard a small cohort of A2–B2 learners and one or two programs.
|
||
- Focus on curriculum fit, tutor behavior, and exam simulation realism.
|
||
- Collect data to refine prompt templates, lesson design, and dashboard UX.
|
||
|
||
### 8.3 Public Launch
|
||
|
||
- Enable subscription plans and payment.
|
||
- Turn on production monitoring and on-call processes.
|
||
- Iterate on performance, reliability, and content quality based on real usage.
|