madava/avaaz

Fork 0

Files

Madava 20d7a66d57

Continuous Integration / Validate and test changes (push) Successful in 3s

Details

Update PRD, plan, and agent instructions

2025-12-04 08:26:58 +01:00

18 KiB

Raw Blame History

Avaaz Implementation Plan

This implementation plan translates the Product Requirements (docs/PRD.md), product description (README.md), and system architecture (docs/architecture.md) into concrete, phased engineering work for a production-grade Avaaz deployment.

The goal is to deliver an end-to-end, voice-first AI speaking coach that supports learners from A1–B2, with B2 oral exam readiness as the primary outcome.

1. Guiding Principles

B2 exam readiness first, A1–B2 capable: Design features and data models so that they support A1–B2 learners, but prioritize workflows that move learners toward B2 oral exam success.
Voice-first, text-strong: Optimize for real-time speech-to-speech interactions, with robust transcripts and text UX as first-class companions.
Single source of truth: Keep curriculum, lessons, transcripts, and analytics centralized in PostgreSQL + pgvector; no separate vector store.
Continuous sessions: All conversations run within persistent sessions (/sessions/default), preserving state across reconnects.
Infrastructure parity: Development Docker stack mirrors production VPS stacks (infra/app), as described in docs/architecture.md.
Security and privacy: Apply strong auth, least-privilege access, safe logging, and clear retention policies for voice/transcript data.

2. High-Level Phasing

Phase 1 – Foundation (M0–M2)

Set up core infrastructure (Dockerized backend, frontend, LiveKit, Postgres+pgvector, Caddy).
Implement authentication, user model, and basic session handling.
Implement minimal voice conversation loop (user ↔ AI tutor) with basic transcripts.
Define initial CEFR-aware curriculum data model and seed a small set of lessons.

Phase 2 – Learning Experience & Analytics (M3–M4)

Implement full A1–B2 curriculum representation, scenarios, and level-aware adaptive tutoring.
Add progress dashboard, gamification basics, and post-session summaries.
Implement AI-assisted lesson authoring and learner-upload-based lessons.
Introduce mock exam templates (A1–A2, A2–B1, B1–B2) and B2-focused exam reports.

Phase 3 – Scale, Reliability & Monetization (M5–M6)

Harden infrastructure (observability, health checks, admin dashboards).
Add subscription plans and Stripe integration.
Optimize performance (latency, concurrency), tune analytics pipelines, and finalize launch-readiness tasks.

3. Backend Workstream (FastAPI + LiveKit + LLMs)

3.1 Core Service Setup

Goals

Production-ready FastAPI service with auth, sessions, and integrations.

Tasks

Use the existing backend layout under app/backend as the foundation:
- app/backend/main.py – app factory and router wiring.
- app/backend/core/config.py – Pydantic-settings for core configuration, DB_URL, LLM keys, LiveKit, Stripe, etc.
- app/backend/core/database.py – database/session utilities; extend to add SQLAlchemy, pgvector, and Alembic integration.
- app/backend/api/v1/router.py – versioned API router aggregator; include routers from feature and operation modules (existing features.auth, operations.health, plus future lessons, chat, documents).
- app/backend/features/* and app/backend/operations/* – domain logic and HTTP routers (e.g., auth, lessons, chat, payments, document upload, health).
Implement base middleware (CORS, logging, request ID, error handling).
Ensure /health, /health/live, and /health/ready endpoints are wired and return basic dependency checks (DB connectivity, LiveKit reachability, LLM connectivity where safe).

Deliverables

Running FastAPI service in Docker with /health OK and OpenAPI docs available.

3.2 Data Model & Persistence

Goals

Support A1–B2 curriculum, lessons, sessions, transcripts, and analytics in PostgreSQL + pgvector.

Tasks

Design and implement SQLAlchemy models:
- User – profile, locale, target level, subscription plan, preferences.
- CurriculumObjective – per level (A1–B2), skill (reception, production, interaction, mediation), descriptor text.
- Lesson – CEFR level, objectives, type (lesson, scenario, exam part), metadata (topic, context).
- Scenario / ScenarioStep – structured oral tasks (self-presentation, picture description, opinion exchange, arguing a statement) with configuration for timing and mode (individual/pair).
- Session – persistent conversational session per user (mode, state, summary, last_activity_at).
- Turn – individual utterances with role (user/AI), timestamps, raw transcript, audio reference, CEFR difficulty metadata.
- ExamTemplate / ExamPart – A1–A2, A2–B1, B1–B2 templates with timing, task types, scoring dimensions.
- ExamAttempt / ExamScore – attempt metadata, estimated CEFR level, component scores.
- UploadDocument / DocumentChunk – files and parsed chunks with vector embeddings (stored alongside or extending the existing backend package under app/backend).
- ProgressSnapshot – aggregate metrics for dashboards (per user and optionally per program).
- Subscription / PaymentEvent – billing state and usage limits.
- Note: Seed the database with the specific plans defined in README.md (First Light, Spark, Glow, Shine, Radiance) and their respective limits.
Add related Alembic migrations; verify they run cleanly on dev DB.

Deliverables

Migrations and models aligned with PRD feature set and architecture.

3.3 Authentication & User Management

Goals

Secure user auth using FastAPI Users (or equivalent) with JWT and refresh tokens.

Tasks

Configure FastAPI Users:
- Email/password registration, login, password reset, email verification.
- Role support (learner, instructor, admin) for curriculum authoring and admin dashboards.
Integrate auth into routes (dependencies.py with current_user).
Implement users endpoints for profile (target CEFR level, locale) and preferences (greeting verbosity, data retention preferences).

Deliverables

Auth flows working in backend and callable from Postman / curl.

3.4 Session Management & Transcripts

Goals

Provide continuous session behavior with persistent history, as described in docs/architecture.md and PRD.

Tasks

Implement GET /sessions/default:
- Create or fetch Session for current user.
- Load summary, current lesson state, and progress context.
Implement POST /sessions/default/token:
- Generate short-lived LiveKit token with room identity tied to the session.
Integrate with LiveKit Agent:
- Implement an LLM integration module (for example under app/backend/features/llm.py or similar) that configures the realtime session using historical summary, current goals, and mode (lesson/mock exam/free).
Implement transcript persistence:
- Receive partial/final transcripts from LiveKit/agent.
- Append Turn records and maintain rolling summaries for context.
- Respect retention settings.
Implement post-session summarization endpoint / background job:
- Generate per-session summary, strengths/weaknesses, recommended next steps.
Implement on-demand translation:
- Endpoint (e.g., /chat/translate) or integrated socket message to translate user/AI text between target and native languages (supporting PRD Section 4.2).

Deliverables

API and background flows that maintain continuous conversational context per user.

3.5 Curriculum & Lesson APIs

Goals

Expose CEFR-aligned curriculum and lesson content to frontend and agent.

Tasks

Implement lessons router:
- List lessons by level, topic, recommended next steps.
- Fetch details for a specific lesson, including objectives and scenario steps.
- Mark lesson progress and completion; update ProgressSnapshot.
Implement endpoints for curriculum objectives and mapping to lessons.
Implement endpoints to retrieve scenario templates for mock exams and regular lessons.

Deliverables

Stable JSON API for curriculum and lessons, used by frontend and agent system prompts.

3.6 AI-Assisted Authoring & User Uploads

Goals

Support instructor-designed and learner-generated lessons from uploaded materials.

Tasks

Implement documents router:
- File upload endpoints for documents and images (instructor and learner scopes).
- Trigger document processing pipeline (Docling or similar) to parse text and structure.
- Chunk documents and store embeddings in DocumentChunk using pgvector.
Implement instructor authoring endpoints:
- Create/update/delete lessons referencing uploaded documents/images.
- AI-assisted suggestion endpoint that uses LLM to propose lesson structure, prompts, and exam-style tasks conditioned on level and objectives.
Implement learner upload endpoints:
- User-specific upload and lesson creation (on-the-fly lessons).
- Link created “ad-hoc” lessons to sessions so the tutor can reference them during practice.

Deliverables

Endpoints supporting both admin/instructor authoring and user-driven contextual lessons.

3.7 Mock Exam Engine & Scoring

Goals

Implement configurable mock oral exam flows for A1–A2, A2–B1, and B1–B2, with B2 focus.

Tasks

Implement exam orchestration service:
- Given an ExamTemplate, manage progression through ExamParts (including warm-up, individual tasks, pair tasks).
- Enforce timing and mode flags to drive agent prompts.
Integrate scoring:
- Use LLM to derive component scores (fluency, pronunciation, grammar, vocabulary, coherence) from transcripts.
- Map to estimated CEFR band and store in ExamScore.
Expose endpoints:
- Start exam, fetch exam status, retrieve past exam results.

Deliverables

End-to-end exam session that runs via the same LiveKit + agent infrastructure and stores exam results.

3.8 Analytics & Reporting

Goals

Provide learner-level dashboards and program-level reporting.

Tasks

Implement periodic aggregation (cron/async tasks) populating ProgressSnapshot.
Implement analytics endpoints:
- Learner metrics (minutes spoken, session counts, trends per skill).
- Program-level metrics (for instructors/coordinators) with appropriate role-based access.
Ensure privacy controls (anonymized or pseudonymized data where required).

Deliverables

Backend API supporting progress dashboards and reports as per PRD.

4. Frontend Workstream (Next.js + LiveKit)

4.1 Foundation & Layout

Goals

Production-ready Next.js PWA front-end that matches Avaaz branding and supports auth, routing, and basic pages.

Tasks

Initialize Next.js app (per README.md):
- Configure next.config.js, TypeScript, ESLint, PWA manifest, and global styles.
- Implement app/layout.tsx with theme, localization provider, and navigation.
- Implement app/page.tsx landing page aligned with product positioning (A1–B2, B2 focus).
Implement auth pages (login, register, email verification, forgot password).

Deliverables

Frontend skeleton running under Docker, reachable via Caddy in dev stack.

4.2 Chat & Voice Experience

Goals

Voice-first conversational UI integrated with LiveKit and backend sessions.

Tasks

Build ChatInterface.tsx:
- Microphone controls, connection status, basic waveform/level visualization.
- Rendering of AI and user turns with text and visual aids (images, tables) as provided by backend/agent.
- Translation Support: UI controls to translate specific messages on demand (toggle or click-to-translate).
- Error states for mic and network issues; text-only fallback UI.
Integrate with backend session APIs:
- On login, call GET /sessions/default, then POST /sessions/default/token.
- Connect to LiveKit using the token; handle reconnection logic.
Display contextual greeting and summary on session start using data returned from sessions API.

Deliverables

Usable chat interface capable of sustaining real-time conversation with the AI tutor.

4.3 Curriculum & Lesson UX

Goals

Allow learners to browse curriculum, start lessons, and view progress.

Tasks

Implement curriculum overview page:
- Display modules and lessons grouped by CEFR levels (A1–B2).
- Indicate completion and recommended next lessons.
Implement lesson detail page:
- Show lesson goals, target level, estimated time, and exam-related tags.
- Start lesson → opens chat view in appropriate mode with lesson context.
Integrate progress indicators (streaks, minutes, CEFR band) into dashboard.

Deliverables

Navigation and views covering core learning flows described in PRD.

4.4 Mock Exam UX

Goals

Implement exam-specific UX consistent with oral exams and PRD.

Tasks

Build exam selection page:
- Allow user to choose exam level (A1–A2, A2–B1, B1–B2/B2-mock).
In-session exam UI:
- Show current exam part, timer, and appropriate instructions.
- Indicate whether current part is scored or warm-up.
Results page:
- Show estimated CEFR level, component scores, and textual feedback.
- Provide links to detailed transcripts, audio, and recommended follow-up lessons.

Deliverables

End-to-end exam flow from selection to results.

4.5 AI-Assisted Authoring & Upload UX

Goals

Provide UIs for instructors and learners to upload content and create lessons.

Tasks

Instructor interface:
- Lesson builder UI with level, objectives, exam part, and document selection.
- “Generate with AI” action to fetch suggested prompts/structure; edit-in-place and publish.
Learner interface:
- Simple upload flow (document/image) to create ad-hoc practice.
- Quick-start buttons to jump from uploaded content to a tailored lesson in the chat interface.

Deliverables

Authoring tools that map onto backend authoring APIs.

4.6 Analytics & Admin Health Dashboard

Goals

Provide admin and instructor dashboards for system health and learner analytics.

Tasks

Learner dashboard:
- Visualize key metrics and streaks, integrated with backend analytics.
Instructor/program dashboard:
- Aggregate usage and progress metrics for groups.
Admin health dashboard:
- Surface backend /health status, LiveKit status, DB health indicators, and LLM connectivity signals.

Deliverables

Dashboards that satisfy PRD’s analytics and health visibility requirements.

5. Real-Time & Media Workstream (LiveKit + Agents)

5.1 LiveKit Server & Config

Tasks

Use the existing livekit service in app/docker-compose.yml as the basis, keeping signaling on port 7880 and the WebRTC media port range configurable via environment variables (currently defaulting to 60000–60100) and attached to the shared proxy network used by Caddy.
Ensure secure API keys and appropriate room/track settings for voice-only sessions.
Configure UDP ports and signaling endpoints (rtc.avaaz.ai → Caddy → livekit:7880) as described in docs/architecture.md and infra/Caddyfile.

5.2 Client Integration

Tasks

Wire frontend to LiveKit:
- Use @livekit/client to join rooms using tokens from backend.
- Handle reconnection and session resumption.
Integrate with backend session and agent orchestration.

5.3 Agent Integration with Realtime LLMs

Tasks

Implement LiveKit Agent that:
- Connects to OpenAI Realtime or Gemini Live according to configuration.
- Streams user audio and receives streamed AI audio and partial transcripts.
- Forwards transcripts and metadata to backend for persistence.
Implement prompt templates for:
- Regular lessons, mock exams, free conversation.
- CEFR-level adaptation and exam-specific tasks.

Deliverables

Stable real-time pipeline from user microphone to LLM and back, integrated with backend logic.

6. Infrastructure & DevOps Workstream

6.1 Docker & Compose

Tasks

Define and refine:
- infra/docker-compose.yml – infra stack (Caddy, Gitea, Gitea runner).
- app/docker-compose.yml – app stack (frontend, backend, LiveKit, Postgres+pgvector).
Configure volumes and networks (proxy network for routing via Caddy).

6.2 CI & CD (Gitea + Actions)

Tasks

CI:
- Extend .gitea/workflows/ci.yml to run linting, type-checking, and tests for backend and frontend once those projects are scaffolded under app/backend and app/frontend.
- Add build verification for any Docker images produced for the app stack.
CD:
- Use .gitea/workflows/cd.yml as the tag-based deploy workflow, following the deployment approach in docs/architecture.md.
- Deploy tags v* only if they are on main.
- Use /health and key endpoints for readiness checks; roll back on failures.

6.3 Observability & Monitoring

Tasks

Centralize logs and metrics for backend, frontend, LiveKit, and Postgres.
Configure alerting for:
- Application errors.
- Latency and uptime SLOs for voice and API endpoints.
- Resource usage (CPU, memory, DB connections).

7. Quality, Security, and Compliance

7.1 Testing Strategy

Tasks

Backend:
- Unit tests for core logic modules (e.g., health checks, config, LLM/document/payment integration) and any data models.
- Integration tests for auth, sessions, and lessons using httpx + pytest.
Frontend:
- Component tests for core UI (chat, curriculum, dashboards).
- E2E flows for login, start lesson, start exam, and view progress.
Voice stack:
- Automated sanity checks for LiveKit connectivity and audio round-trip.

7.2 Security & Privacy

Tasks

Apply OWASP-aligned input validation and output encoding.
Enforce HTTPS everywhere via Caddy; HSTS and secure cookies where applicable.
Implement appropriate retention and deletion policies for audio and transcripts.
Document data handling for learners and institutions (for future legal review).

8. Rollout Plan

8.1 Internal Alpha

Run app stack locally for core team.
Validate foundational flows: auth, voice session, basic lesson, transcripts.

8.2 Closed Beta

Onboard a small cohort of A2–B2 learners and one or two programs.
Focus on curriculum fit, tutor behavior, and exam simulation realism.
Collect data to refine prompt templates, lesson design, and dashboard UX.

8.3 Public Launch

Enable subscription plans and payment.
Turn on production monitoring and on-call processes.
Iterate on performance, reliability, and content quality based on real usage.

18 KiB Raw Blame History Unescape Escape

Avaaz Implementation Plan

1. Guiding Principles

2. High-Level Phasing

Phase 1 – Foundation (M0–M2)

Phase 2 – Learning Experience & Analytics (M3–M4)

Phase 3 – Scale, Reliability & Monetization (M5–M6)

3. Backend Workstream (FastAPI + LiveKit + LLMs)

3.1 Core Service Setup

3.2 Data Model & Persistence

3.3 Authentication & User Management

3.4 Session Management & Transcripts

3.5 Curriculum & Lesson APIs

3.6 AI-Assisted Authoring & User Uploads

3.7 Mock Exam Engine & Scoring

3.8 Analytics & Reporting

4. Frontend Workstream (Next.js + LiveKit)

4.1 Foundation & Layout

4.2 Chat & Voice Experience

4.3 Curriculum & Lesson UX

4.4 Mock Exam UX

4.5 AI-Assisted Authoring & Upload UX

4.6 Analytics & Admin Health Dashboard

5. Real-Time & Media Workstream (LiveKit + Agents)

5.1 LiveKit Server & Config

5.2 Client Integration

5.3 Agent Integration with Realtime LLMs

6. Infrastructure & DevOps Workstream

6.1 Docker & Compose

6.2 CI & CD (Gitea + Actions)

6.3 Observability & Monitoring

7. Quality, Security, and Compliance

7.1 Testing Strategy

7.2 Security & Privacy

8. Rollout Plan

8.1 Internal Alpha

8.2 Closed Beta

8.3 Public Launch

18 KiB

Raw Blame History