Update PRD, plan, and agent instructions

2025-12-04 08:26:58 +01:00
parent d6b61ae8fb
commit 20d7a66d57
3 changed files with 89 additions and 18 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
--- a/docs/PRD.md
+++ b/docs/PRD.md
@@ -7,23 +7,27 @@
 | Date       | Version | Author           | Description                                                                                  |
 | ---------- | ------- | ---------------- | -------------------------------------------------------------------------------------------- |
 | 2025-12-04 | 0.6.0   | Internal (Codex) | Tightened health-check requirements and ensured all endpoints and criteria are testable.     |
 | 2025-12-04 | 0.5.0   | Internal (Codex) | Added requirements from README/architecture; ensured testable, learner-facing feature coverage. |
 | 2025-12-04 | 0.4.0   | Internal (Codex) | Simplified language, reduced redundancy, clarified non-functional requirements.              |
 | 2025-12-03 | 0.3.0   | Internal (Codex) | Reinforced mobile-first learner flow, clarified spoken-skill focus, and (A1–B2) oral practice scope. |
 | 2025-12-03 | 0.2.0   | Internal (Codex) | Clarified A1–B2 scope; added curriculum/exam, authoring, persistence, and health requirements. |
 | 2025-12-03 | 0.1.0   | Internal (Codex) | Initial PRD drafted from `README.md` and `docs/architecture.md`.                             |
-**Date:** 2025-12-03  
+**Date:** 2025-12-04  
 **Status:** Draft  
 ## Product Overview
-Avaaz is a mobile and web application featuring a motivating conversational AI tutor powered by advanced agentic capabilities. It teaches oral language skills through structured, interactive, voice-first lessons that adapt to each student’s pace and performance. Avaaz supports learners at any CEFR level from A1 to B2, with the primary outcome being readiness for the CEFR B2 oral proficiency exam and confident participation in real-life conversations in the destination country.
+Avaaz is a mobile and web app with a conversational AI tutor. It teaches speaking skills through structured, interactive, voice-first lessons that adapt to each learner’s pace and performance. Avaaz supports CEFR levels A1–B2, with a primary goal of B2 oral exam readiness and confident real-life conversation in the destination country.
-Avaaz combines a CEFR-aligned curriculum with real-time AI conversation to deliver natural, low-latency speech-to-speech interactions across devices.
+Avaaz combines a CEFR-aligned curriculum with real-time AI conversation to deliver low-latency speech-to-speech practice across devices. Learners primarily use native iOS and Android apps. Instructors, coordinators, and administrators use a responsive web portal to manage curricula, reporting, and settings.
 **Problem Statement:**  
-Adult immigrants and other language learners struggle to achieve confident speaking ability in their target language, especially at the B2 level required for exams, citizenship, or professional roles. Existing solutions (apps, textbooks, group classes) emphasize passive skills (reading, vocabulary drills, grammar) and offer limited opportunities for high-quality spoken practice with immediate, personalized feedback. Human tutors are expensive, scarce in many regions, and difficult to scale, leaving learners underprepared for real-life conversations and high-stakes oral exams.
+Adult immigrants and other language learners struggle to achieve confident speaking ability in their target language, especially at the B2 level required for exams, citizenship, or professional roles. Existing solutions (apps, textbooks, group classes) emphasize passive skills (reading, vocabulary drills, grammar) that do not directly translate into fluent speech. Avaaz intentionally keeps reading and writing as contextual supports only—every lesson, scenario, and assessment is designed around spoken interaction, pronunciation, fluency, and comprehension. Human tutors are expensive, scarce in many regions, and difficult to scale, leaving learners underprepared for real-life conversations and high-stakes oral exams.
 **Product Vision:**  
-To become the trusted AI speaking coach for immigrants and global learners, providing an always-available, personalized conversational tutor that mirrors a human teacher’s strengths—natural dialogue, rich corrective feedback, and realistic scenarios—while scaling to thousands of learners. Avaaz will measurably improve speaking confidence and B2 exam readiness by:
+To be the trusted AI speaking coach for immigrants and global learners. Avaaz should feel like a human tutor—natural dialogue, rich corrective feedback, and realistic scenarios—while scaling to thousands of learners. Avaaz will measurably improve speaking confidence and B2 exam readiness by:
 - Reducing learners’ anxiety in real conversations.
 - Increasing B2 oral exam pass rates.
@@ -31,7 +35,7 @@ To become the trusted AI speaking coach for immigrants and global learners, prov
 ## User and Audience
-Avaaz targets adults learning to speak a new language for migration, work, and social integration, with an initial focus on English → Norwegian Bokmål. The core experience is voice-first, but also works well in noisy or low-bandwidth contexts via text and transcripts. Learners can start as absolute beginners (A1) and progress through A2 and B1 up to B2; the main goal is to help them reach and pass the B2 oral exam while remaining useful at every step in this progression.
+Avaaz serves adults learning a new language for migration, work, and social integration, with an initial focus on English → Norwegian Bokmål. Learners use the mobile apps or web app to practice speaking; text and transcripts are supporting aids, not the main focus. Instructors, coordinators, and administrators use a web portal to manage curricula, monitor cohorts, and configure settings. Learners can start from A1 and progress through A2 and B1 up to B2; the main goal is to help them reach and pass the B2 oral exam while adding value at each stage.
 **Personas:**
@@ -55,16 +59,16 @@ Avaaz targets adults learning to speak a new language for migration, work, and s
 **User Scenarios:**
 - **Daily commute micro-lessons (Primary Persona):**  
-  On the bus after work, a learner opens Avaaz on their phone, starts a 10-minute speaking session on “Small Talk at the Workplace,” and practices greetings with the AI tutor. The system adapts prompts based on errors, gives immediate pronunciation and grammar corrections, and updates the learner’s speaking progress toward their current target level (A1–B2, typically B2 over time).
+  On the bus after work, a learner starts a 10-minute speaking session on “Small Talk at the Workplace.” Avaaz adapts prompts based on mistakes, gives immediate pronunciation and grammar feedback, and updates progress toward the learner’s target level.
 - **Mock B2 oral exam before test day (Primary Persona):**  
-  A week before the exam, the learner runs a full mock oral exam in “Exam Mode.” Avaaz simulates an examiner with timed sections, tracks coherence, fluency, lexical range, and accuracy, and produces an exam-style report with estimated CEFR level and specific improvement suggestions.
+  A week before the exam, the learner runs a full mock oral exam in “Exam Mode.” Avaaz simulates an examiner with timed sections, tracks key speaking skills, and produces an exam-style report with an estimated CEFR level and clear improvement suggestions.
 - **Preparing for workplace interactions (Working Professional):**  
-  Before a performance review, a learner practices the “Performance Review Conversation” scenario. Avaaz role-plays both manager and colleague, uses realistic workplace language, and coaches the learner on polite but assertive phrasing, including cultural norms.
+  Before a performance review, a learner practices the “Performance Review Conversation” scenario. Avaaz role-plays manager and colleague, uses realistic workplace language, and coaches polite but assertive phrasing and cultural norms.
 - **Program-wide monitoring (Program Coordinator):**  
-  An instructor encourages all students to complete three speaking sessions per week. The coordinator reviews aggregated progress dashboards (e.g., minutes spoken, estimated CEFR band, completion of key scenarios) to identify learners who need extra support and to report impact to stakeholders.
+  An instructor encourages all students to complete three speaking sessions per week. The coordinator reviews dashboards (e.g., minutes spoken, estimated CEFR band, completion of key scenarios) to spot learners who need support and to report impact to stakeholders.
 ## Functional Requirements (Features)
@@ -89,7 +93,7 @@ This section describes the core capabilities required for a production-grade Ava
 2. **User Story: Adaptive Conversational Flow**  
   - **As a** learner with uneven skills,  
   - **I want to** receive dynamically adjusted prompts and scaffolding,  
-   - **so that** conversations stay within my zone of proximal development and challenge me appropriately.  
+   - **so that** conversations stay challenging but not overwhelming.  
   **Acceptance Criteria:**  
   - AI tutor adjusts prompt complexity based on recent performance (e.g., error rate, hesitation, completion rate) and current CEFR level (A1–B2).  
@@ -101,7 +105,20 @@ This section describes the core capabilities required for a production-grade Ava
   **Dependencies:** Backend lesson/lesson-state models, LLM prompt engineering and agent logic, PostgreSQL + pgvector for storing session metrics.
-### 2. CEFR-B2 Aligned Curriculum & Real-Life Scenarios
+3. **User Story: Comprehensive Speaking Feedback**  
   - **As a** learner preparing for real conversations and exams,  
   - **I want to** receive detailed feedback on my speaking, not just pronunciation and grammar,  
   - **so that** I understand my strengths and weaknesses across all key speaking skills.  
   **Acceptance Criteria:**  
   - After a lesson or mock exam, the system can display or generate scores or qualitative ratings for fluency, pronunciation, grammar, vocabulary, and coherence.  
   - Feedback includes at least 2–3 concrete examples from the session (e.g., misused word, unclear phrasing, hesitation).  
   - Feedback format is consistent across sessions and mock exams so results are comparable over time.  
   - Learner can view previous feedback reports from a “History” or equivalent section.  
   **Dependencies:** Conversation transcription, scoring and analysis models, feedback formatting logic, persistent storage for feedback reports.
 ### 2. CEFR Aligned Curriculum & Real-Life Scenarios
 1. **User Story: Structured CEFR-Aligned Path**  
   - **As a** motivated learner starting anywhere between A1 and B2,  
@@ -143,6 +160,19 @@ This section describes the core capabilities required for a production-grade Ava
   **Dependencies:** Backend curriculum data model, admin tooling for curriculum management, reporting/analytics based on objectives.
 4. **User Story: Accent and Cultural Adaptation**  
   - **As a** learner moving to a specific country or region,  
   - **I want to** practice with local accents and culturally appropriate language,  
   - **so that** my speech sounds natural and polite in real life.  
   **Acceptance Criteria:**  
   - Lessons and scenarios can be tagged with destination country/region and typical dialect or accent.  
   - AI tutor can switch between at least one default accent and one local accent where the target language supports it.  
   - Scenarios include common cultural norms and politeness strategies (e.g., formal vs informal address) that the tutor can explain on request.  
   - Coordinators or admins can choose which regional variants are enabled for their learners.  
   **Dependencies:** Content localization by region, voice configuration options for accents, cultural notes in curriculum content, admin configuration UI or settings.
 ### 3. Mock Oral Exam Mode & Assessment
 1. **User Story: Full B2 Mock Exam**  
@@ -230,6 +260,19 @@ This section describes the core capabilities required for a production-grade Ava
   **Dependencies:** Role-based access control, reporting queries, secure data storage and anonymization, UI components for analytics.
 3. **User Story: Gamified Challenges and Rewards**  
   - **As a** learner who struggles to keep a regular speaking habit,  
   - **I want to** earn streaks, badges, and other rewards when I practice,  
   - **so that** I feel motivated to return and build a long-term habit.  
   **Acceptance Criteria:**  
   - System tracks daily and weekly speaking activity and calculates streaks based on defined rules (e.g., at least one completed session per day).  
   - Learners can unlock badges or milestones based on clear criteria (e.g., total minutes spoken, number of sessions, mock exams completed).  
   - Gamification status (streaks, badges, milestones) is visible in the dashboard and updates after each session.  
   - All streak and badge rules are documented in-app so they can be tested and verified.  
   **Dependencies:** Analytics and event tracking, gamification rules engine or logic, frontend components to display streaks and badges.
 ### 6. User Accounts, Authentication, and Subscription Management
 1. **User Story: Account Creation and Sign-In**  
@@ -273,6 +316,19 @@ This section describes the core capabilities required for a production-grade Ava
   **Dependencies:** Next.js PWA configuration, centralized state in backend, device/session tracking, secure token handling.
 2. **User Story: Consistent Tutor Experience Across Devices**  
   - **As a** learner who sometimes uses headphones and sometimes speakers,  
   - **I want to** have a consistent AI tutor voice and behavior on all my devices,  
   - **so that** my listening practice is predictable and comfortable.  
   **Acceptance Criteria:**  
   - Tutor voice selection (gender, regional accent) is stored in the user profile and applied to all new sessions on any device.  
   - When a learner changes voice settings on one device, the change is reflected on other devices within one session or logout/login cycle.  
   - At least two distinct tutor voice options are available at launch; more can be added later without breaking existing settings.  
   - A simple test script or admin view can confirm which voice configuration is currently active for a given user.  
   **Dependencies:** Voice provider configuration, user profile settings for voice, frontend settings UI, backend APIs for voice preference storage and retrieval.
 ### 8. AI-Assisted Curriculum and Lesson Authoring
 1. **User Story: Instructor-Designed Lessons with AI Support**  
@@ -333,13 +389,16 @@ This section describes the core capabilities required for a production-grade Ava
 1. **User Story: Backend Health Check Endpoint**  
   - **As a** platform operator,  
-   - **I want to** have a simple backend health check endpoint,  
+   - **I want to** have standard health check endpoints for liveness, readiness, and detailed status,  
-   - **so that** CI/CD pipelines, uptime monitors, and dashboards can verify that the API is reachable and healthy.  
+   - **so that** CI/CD pipelines, uptime monitors, and dashboards can verify that the API is running, ready, and healthy.  
   **Acceptance Criteria:**  
-   - Backend exposes a minimal, authenticated-or-public `GET /health` endpoint returning status information (e.g., service up, database reachable, key dependencies OK).  
+   - Backend exposes three unauthenticated endpoints:  
-   - Health endpoint is used in deployment pipelines and monitoring (e.g., `https://api.<domain>/health`).  
+     - `GET /health/live` returns HTTP 200 and body `"live"` when the process is running.  
-   - Endpoint is lightweight and safe to call frequently.  
+     - `GET /health/ready` returns HTTP 200 and body `"ready"` when critical dependencies are OK, and HTTP 503 otherwise.  
     - `GET /health` returns a JSON body with an overall status field and per-component checks, and uses HTTP 200 for `"pass"` and HTTP 503 for `"fail"`.  
   - Health endpoints are used in deployment pipelines and monitoring (e.g., `https://api.<domain>/health`, `/health/ready`, `/health/live`).  
   - All three endpoints are lightweight enough to be polled on the order of seconds without impacting users.  
   **Dependencies:** FastAPI route for health checks, integration with basic internal dependency checks (DB, LiveKit, LLM connectivity where feasible).
@@ -406,7 +465,15 @@ This section describes the core capabilities required for a production-grade Ava
 - Secrets stored securely (e.g., environment variables, secret manager), never hard-coded in the repository.  
 - Rate limiting, abuse detection, and monitoring around critical endpoints.
-### Non-Functional Requirements
+### Deployment & CI/CD
 - Production deployment uses Docker-based stacks on a single VPS, with a separate infra stack (`caddy`, `gitea`, `gitea-runner`) and app stack (`frontend`, `backend`, `postgres`, `livekit`) defined in version-controlled Compose files.  
 - Caddy terminates TLS for all public domains and routes traffic to the correct internal services (frontend, backend, LiveKit, Gitea) over a shared Docker network.  
 - A Gitea Actions-based CI pipeline runs on each feature branch and pull request, executing backend/frontend tests, static analysis, and image builds, and must pass before merge to `main`.  
 - A tag-based CD pipeline (tags matching `v*` on `main`) builds production images and redeploys the app stack on the VPS in a controlled way, minimizing downtime.  
 - CI/CD workflows are themselves versioned in the repository so changes to validation or deployment steps are reviewable and reproducible.  
 ### Performance, Reliability, and Scalability
 **Performance:**  
--- a/docs/plan.md
+++ b/docs/plan.md
@@ -84,6 +84,7 @@ The goal is to deliver an end-to-end, voice-first AI speaking coach that support
  - `UploadDocument` / `DocumentChunk` – files and parsed chunks with `vector` embeddings (stored alongside or extending the existing backend package under `app/backend`).
  - `ProgressSnapshot` – aggregate metrics for dashboards (per user and optionally per program).
  - `Subscription` / `PaymentEvent` – billing state and usage limits.
  - **Note:** Seed the database with the specific plans defined in `README.md` (First Light, Spark, Glow, Shine, Radiance) and their respective limits.
 - Add related Alembic migrations; verify they run cleanly on dev DB.
 **Deliverables**
@@ -129,6 +130,8 @@ The goal is to deliver an end-to-end, voice-first AI speaking coach that support
  - Respect retention settings.
 - Implement post-session summarization endpoint / background job:
  - Generate per-session summary, strengths/weaknesses, recommended next steps.
 - Implement on-demand translation:
  - Endpoint (e.g., `/chat/translate`) or integrated socket message to translate user/AI text between target and native languages (supporting PRD Section 4.2).
 **Deliverables**
@@ -248,6 +251,7 @@ The goal is to deliver an end-to-end, voice-first AI speaking coach that support
 - Build `ChatInterface.tsx`:
  - Microphone controls, connection status, basic waveform/level visualization.
  - Rendering of AI and user turns with text and visual aids (images, tables) as provided by backend/agent.
  - **Translation Support:** UI controls to translate specific messages on demand (toggle or click-to-translate).
  - Error states for mic and network issues; text-only fallback UI.
 - Integrate with backend session APIs:
  - On login, call `GET /sessions/default`, then `POST /sessions/default/token`.