diff --git a/docs/PRD.md b/docs/PRD.md index e69de29..648b7d8 100644 --- a/docs/PRD.md +++ b/docs/PRD.md @@ -0,0 +1,501 @@ +# Product Requirements Document + +## Product Information + +**Title:** Avaaz +**Change History:** + +| Date | Version | Author | Description | +| ---------- | ------- | ---------------- | -------------------------------------------------------------------------------------------- | +| 2025-12-03 | 0.2.0 | Internal (Codex) | Clarified A1–B2 scope; added curriculum/exam, authoring, persistence, and health requirements. | +| 2025-12-03 | 0.1.0 | Internal (Codex) | Initial PRD drafted from `README.md` and `docs/architecture.md`. | + +**Date:** 2025-12-03 +**Status:** Draft + +## Product Overview + +Avaaz is a mobile and web application featuring a motivating conversational AI tutor powered by advanced agentic capabilities. It teaches oral language skills through structured, interactive, voice-first lessons that adapt to each student’s pace and performance. Avaaz supports learners at any CEFR level from A1 to B2, with the primary outcome being readiness for the CEFR B2 oral proficiency exam and confident participation in real-life conversations in the destination country. + +Avaaz combines a CEFR-aligned curriculum, real-time AI conversation, and production-grade infrastructure (FastAPI backend, Next.js frontend, LiveKit WebRTC, and PostgreSQL + pgvector) to deliver natural, low-latency speech-to-speech interactions across devices. + +**Problem Statement:** +Adult immigrants and other language learners struggle to achieve confident speaking ability in their target language, especially at the B2 level required for exams, citizenship, or professional roles. Existing solutions (apps, textbooks, group classes) emphasize passive skills (reading, vocabulary drills, grammar) and offer limited opportunities for high-quality spoken practice with immediate, personalized feedback. Human tutors are expensive, scarce in many regions, and difficult to scale, leaving learners underprepared for real-life conversations and high-stakes oral exams. + +**Product Vision:** +To become the trusted AI speaking coach for immigrants and global learners, providing an always-available, personalized conversational tutor that mirrors a human teacher’s strengths—natural dialogue, rich corrective feedback, and realistic scenarios—while scaling to thousands of learners. Avaaz will measurably improve speaking confidence and B2 exam readiness by: + +- Reducing learners’ anxiety in real conversations. +- Increasing B2 oral exam pass rates. +- Shortening the time required to progress from A1 → A2 → B1 → B2 speaking proficiency. + +## User and Audience + +Avaaz targets adults learning to speak a new language for migration, work, and social integration, with an initial focus on English → Norwegian Bokmål. The core experience is voice-first, but also works well in noisy or low-bandwidth contexts via text and transcripts. Learners can start as absolute beginners (A1) and progress through A2 and B1 up to B2; the main goal is to help them reach and pass the B2 oral exam while remaining useful at every step in this progression. + +**Personas:** + +- **Primary Persona – Adult Immigrant Exam Candidate (Primary):** + - Age 20–45; recently moved to a new country (e.g., Norway). + - Needs to pass a B2 oral exam for residency, citizenship, or professional accreditation. + - Has limited time (work, family duties) and mixed confidence speaking with natives. + - Uses a mid-range phone or laptop; often learns on evenings and weekends. + - Pain points: insufficient speaking practice, fear of making mistakes, difficulty accessing affordable tutors, and lack of clear feedback on exam readiness. + +- **Secondary Persona – Working Professional Needing Workplace Fluency:** + - Already employed or seeking employment; needs to operate in the target language at work (meetings, clients, daily conversations). + - Wants targeted practice around workplace scenarios (e.g., stand-ups, 1:1s, presentations). + - Pain points: embarrassed about accent/fluency, no safe space to practice, needs domain-specific vocabulary and politeness strategies. + +- **Secondary Persona – Language School / Program Coordinator:** + - Manages groups of learners in a language school, NGO, or integration program. + - Wants a scalable speaking practice tool that complements classes and provides data on learner progress. + - Pain points: limited classroom time, uneven speaking opportunities for students, lack of granular speaking analytics. + +**User Scenarios:** + +- **Daily commute micro-lessons (Primary Persona):** + On the bus after work, a learner opens Avaaz on their phone, starts a 10-minute speaking session on “Small Talk at the Workplace,” and practices greetings with the AI tutor. The system adapts prompts based on errors, gives immediate pronunciation and grammar corrections, and updates the learner’s speaking progress toward their current target level (A1–B2, typically B2 over time). + +- **Mock B2 oral exam before test day (Primary Persona):** + A week before the exam, the learner runs a full mock oral exam in “Exam Mode.” Avaaz simulates an examiner with timed sections, tracks coherence, fluency, lexical range, and accuracy, and produces an exam-style report with estimated CEFR level and specific improvement suggestions. + +- **Preparing for workplace interactions (Working Professional):** + Before a performance review, a learner practices the “Performance Review Conversation” scenario. Avaaz role-plays both manager and colleague, uses realistic workplace language, and coaches the learner on polite but assertive phrasing, including cultural norms. + +- **Program-wide monitoring (Program Coordinator):** + An instructor encourages all students to complete three speaking sessions per week. The coordinator reviews aggregated progress dashboards (e.g., minutes spoken, estimated CEFR band, completion of key scenarios) to identify learners who need extra support and to report impact to stakeholders. + +## Functional Requirements (Features) + +This section describes the core capabilities required for a production-grade Avaaz full-stack application. Each feature is expressed via user stories with acceptance criteria and dependencies. + +### 1. Voice-First Conversational Lessons + +1. **User Story: Real-Time Voice Tutoring** + - **As a** learner between A1 and B2 (with a special focus on B1–B2 exam preparation), + - **I want to** speak with an AI tutor using my microphone in near real time, + - **so that** I can practice spontaneous spoken interaction and receive immediate feedback. + + **Acceptance Criteria:** + - Learner can start a voice session from mobile or web with one tap/click. + - Audio is streamed via WebRTC with end-to-end latency low enough to support natural turn-taking (target < 250 ms one-way). + - AI tutor responds using synthesized voice and on-screen text. + - Both user and tutor audio are transcribed in real time and stored with the session for later review and analytics (subject to retention and privacy policies). + - If the microphone or network fails, the app displays an actionable error and offers a retry or text-only fallback. + + **Dependencies:** LiveKit server (signaling + media), LLM realtime APIs (OpenAI Realtime, Gemini Live), Caddy reverse proxy, WebRTC-capable browsers/mobile clients, backend session orchestration. + +2. **User Story: Adaptive Conversational Flow** + - **As a** learner with uneven skills, + - **I want to** receive dynamically adjusted prompts and scaffolding, + - **so that** conversations stay within my zone of proximal development and challenge me appropriately. + + **Acceptance Criteria:** + - AI tutor adjusts prompt complexity based on recent performance (e.g., error rate, hesitation, completion rate) and current CEFR level (A1–B2). + - System can slow down, rephrase, or switch to simplified questions when the learner struggles. + - System can increase complexity (longer turns, follow-up questions, abstract topics) when the learner performs well. + - Explanations include level-appropriate grammar focus (e.g., simple present and basic word order at A1, more complex clause structures and connectors at B1–B2). + - When explaining, the AI tutor supplements speech with visual and textual aids (images, tables, short written examples) where appropriate. + - Changes in difficulty are logged for analytics. + + **Dependencies:** Backend lesson/lesson-state models, LLM prompt engineering and agent logic, PostgreSQL + pgvector for storing session metrics. + +### 2. CEFR-B2 Aligned Curriculum & Real-Life Scenarios + +1. **User Story: Structured CEFR-Aligned Path** + - **As a** motivated learner starting anywhere between A1 and B2, + - **I want to** follow a clear sequence of speaking lessons mapped to CEFR descriptors, + - **so that** I can track my progress toward B2 and avoid gaps in my skills. + + **Acceptance Criteria:** + - Curriculum is structured into levels (A1, A2, B1, B2) and modules (e.g., “Everyday Life,” “Workplace,” “Public Services”). + - Each speaking lesson includes goals, target CEFR descriptors, example prompts, and success criteria. + - Learner can see which lessons are completed, in progress, or locked. + - The system records completion, time spent, and estimated performance for each lesson. + + **Dependencies:** Backend curriculum models and APIs, frontend curriculum navigation views, content authoring workflow (internal or admin UI), PostgreSQL for storing lesson metadata. + +2. **User Story: Immigrant-Focused Real-Life Scenarios** + - **As a** newly arrived immigrant, + - **I want to** practice conversations that match my daily life (e.g., at the doctor, at school, at work, at public offices), + - **so that** I feel confident handling real interactions in my new country. + + **Acceptance Criteria:** + - Library of scenario templates linked to CEFR levels and contexts (workplace, healthcare, school, housing, etc.). + - For each scenario, the AI tutor can role-play multiple participants (e.g., nurse, receptionist, colleague). + - Visual cues (images, documents, forms) can be shown where relevant. + - Scenarios are localizable (e.g., cultural norms, common phrases) per destination country. + - Scenarios can be designed to emphasize key oral communication purposes seen in official exams: self-presentation, describing pictures or situations, exchanging information, expressing opinions, and arguing for or against a statement. + - Scenario templates support both individual and pair/role-play modes, with configurable durations and turn-taking rules. + + **Dependencies:** Media storage for images/documents, LLM prompt templates by scenario, localization framework, content governance and review processes. + +3. **User Story: Curriculum Model with Multi-Skill Objectives** + - **As a** curriculum designer, + - **I want to** model learning objectives for each level (A1–B2) across reception, production, interaction, and mediation skills, + - **so that** I can align the digital curriculum with established language frameworks and reuse it across languages. + + **Acceptance Criteria:** + - For each CEFR level (A1–B2), the curriculum can capture objectives for listening/reading (reception), speaking/writing (production), interaction (dialogue and conversations), and mediation (explaining and rephrasing). + - Lessons and mock exams reference one or more of these objectives, enabling coverage analysis and reporting. + - Objectives and mappings are configurable per language pair and per country-specific curriculum where applicable. + + **Dependencies:** Backend curriculum data model, admin tooling for curriculum management, reporting/analytics based on objectives. + +### 3. Mock Oral Exam Mode & Assessment + +1. **User Story: Full B2 Mock Exam** + - **As a** learner preparing for a B2 oral exam, + - **I want to** take a timed mock exam that follows the official exam structure, + - **so that** I know what to expect and can benchmark my readiness. + + **Acceptance Criteria:** + - System supports predefined exam templates (sections, timings, types of prompts) for levels A1–A2, A2–B1, and B1–B2, based on local exam formats where applicable. + - Exam templates can include warm-up tasks that are not scored, as well as scored tasks. + - Each exam part can be configured as individual or pair conversation, and as one of several task types: self-presentation, describing a picture or situation, speaking about a familiar topic, exchanging views, expressing opinions, and taking a position on a statement with arguments. + - During the exam, the system enforces timing (visible countdown) and turn-taking rules. + - At the end, the learner receives an exam-like report with an estimated CEFR level and component scores (fluency, pronunciation, vocabulary, grammar, coherence). + - Report is saved and viewable later in the “Results” or “History” section. + - The system can optionally present a small number of stretch tasks from the next higher level to detect learners whose skills may exceed the nominal exam level. + + **Dependencies:** Assessment rubric definitions, scoring models (LLM-based + heuristic), backend report generation, persistent storage of exam sessions and scores. + +2. **User Story: Performance Summaries After Each Session** + - **As a** learner who just completed a session, + - **I want to** see a concise summary of what I did well and what to improve, + - **so that** I can focus my next practice and see my progress over time. + + **Acceptance Criteria:** + - Post-session screen shows key strengths, common errors, and 2–3 prioritized recommendations. + - Summary highlights examples from the conversation (e.g., misused prepositions, pronunciation errors). + - Learner can share or export summaries (e.g., PDF or link) where allowed. + - Summaries contribute to longitudinal analytics (trends by skill over time). + + **Dependencies:** Conversation transcription, error detection pipeline, LLM feedback processing, analytics storage and querying. + +### 4. Multilingual Scaffolding & Integrated Translation + +1. **User Story: Localized UI and Instructions** + - **As a** learner with limited proficiency in the target language, + - **I want to** see the app’s UI and core instructions in my native or preferred language, + - **so that** I am not blocked by interface comprehension while focusing on speaking practice. + + **Acceptance Criteria:** + - App supports multiple UI languages with a clear selector during onboarding and in settings. + - Static text (menus, buttons, error messages) is localized. + - Critical flows (onboarding, subscription, exam mode) are fully localized. + - Default UI language is inferred from locale but always user overrideable. + + **Dependencies:** Localization/i18n system on frontend and backend, translations management process, design support for longer text variants. + +2. **User Story: On-Demand Translations During Practice** + - **As a** low-confidence speaker, + - **I want to** quickly translate AI prompts or my own utterances between my language and the target language, + - **so that** I can stay engaged rather than getting stuck on unknown words. + + **Acceptance Criteria:** + - In-session controls allow optional translations of AI messages and user messages. + - Translation support is clearly marked and can be disabled by instructors (to reduce over-reliance). + - Translation usage is logged for analytics (e.g., frequency by user, session). + - Translations are fast enough to not break conversational flow. + + **Dependencies:** LLM-based or external translation APIs, usage limits and cost management, UI surface in chat and transcripts. + +### 5. Progress Tracking, Gamification, and Analytics + +1. **User Story: Personal Progress Dashboard** + - **As a** learner targeting a CEFR speaking level (A1–B2), + - **I want to** see my progress over time across key skills, + - **so that** I stay motivated and know where to focus and, ultimately, reach my target (often B2). + + **Acceptance Criteria:** + - Dashboard shows time spent speaking, session count, streaks, and estimated CEFR band over time. + - Learner can view trends in specific skill dimensions (fluency, pronunciation, grammar, vocabulary). + - Streaks, badges, and milestones are clearly displayed, with rules explained. + - Data refreshes near-real-time after a session. + + **Dependencies:** Analytics database structures, data aggregation jobs, frontend charts, privacy/consent handling. + +2. **User Story: Program-Level Reporting (Secondary Persona)** + - **As a** coordinator of a small group of learners, + - **I want to** see anonymized or per-learner usage and progress, + - **so that** I can measure impact and intervene early for learners who are falling behind. + + **Acceptance Criteria:** + - Secure, role-based access for coordinators/instructors. + - Metrics include active learners, sessions per week, minutes spoken, and average skill trends. + - Simple export (CSV or PDF) for reporting. + - Data access respects privacy settings and relevant regulations. + + **Dependencies:** Role-based access control, reporting queries, secure data storage and anonymization, UI components for analytics. + +### 6. User Accounts, Authentication, and Subscription Management + +1. **User Story: Account Creation and Sign-In** + - **As a** new learner, + - **I want to** create an account using my email and password (and optionally social login), + - **so that** my progress, preferences, and subscriptions are stored securely. + + **Acceptance Criteria:** + - Email + password registration with verification flow. + - Login with JWT-based sessions; secure password hashing in storage. + - Basic account management (confirm email, change email, password, profile data). + - Session expiry and logout behaviors are clearly implemented. + + **Dependencies:** FastAPI Users (or equivalent auth library), PostgreSQL `user` table/schema, email service for verification, frontend auth flows. + +2. **User Story: Subscription Plans and Billing** + - **As a** serious learner, + - **I want to** choose a subscription plan that fits my needs (e.g., free tier, standard, premium), + - **so that** I can access the right level of usage and features. + + **Acceptance Criteria:** + - Plan definitions (e.g., “Spark,” “Glow,” etc.) with clearly described limits (minutes per month, features like mock exam mode). + - Billing integrated with Stripe (or similar) for recurring subscriptions. + - System enforces plan limits gracefully (e.g., warn at 80% usage, block after limit with clear upgrade options). + - Admin tooling to manage plans and handle refunds/adjustments. + + **Dependencies:** Payment service integration (Stripe), secure webhook handling, backend plan enforcement, accounting/ledger storage. + +### 7. Cross-Device Learning Continuity + +1. **User Story: Seamless Device Switching** + - **As a** learner who uses both phone and laptop, + - **I want to** continue my learning across devices without losing progress, + - **so that** I can practice whenever and wherever it’s convenient. + + **Acceptance Criteria:** + - Sessions, progress, and settings are stored server-side and synced across devices. + - Resume-last-lesson feature available on login. + - PWA support on mobile for near-native experience and offline access to limited features (where feasible). + - Conflict-handling behaviors are defined (e.g., two devices active at once). + + **Dependencies:** Next.js PWA configuration, centralized state in backend, device/session tracking, secure token handling. + +### 8. AI-Assisted Curriculum and Lesson Authoring + +1. **User Story: Instructor-Designed Lessons with AI Support** + - **As a** language instructor or admin, + - **I want to** design and manage lessons for each CEFR level (A1–B2) with AI support, using documents or images as the basis for lessons, + - **so that** I can efficiently create high-quality, curriculum-aligned speaking practice tailored to my learners. + + **Acceptance Criteria:** + - Instructors can upload documents and images (e.g., forms, articles, exam prompts, everyday photos) into the system. + - The backend parses and indexes uploaded material (e.g., via Docling and embeddings) so that AI can reference it during lessons. + - Instructors can select target level(s), objectives, and exam formats when creating or editing a lesson. + - AI suggests lesson structures, prompts, and example dialogues that instructors can review and modify before publishing. + - Lessons are stored with metadata (level, skills, topics, exam parts) and become available in the learner curriculum and mock exams. + + **Dependencies:** Document upload and processing services, LLM-based content generation, instructor/admin UI, PostgreSQL + pgvector storage. + +2. **User Story: Learner-Generated Lessons from Uploaded Material** + - **As a** learner, + - **I want to** upload documents or images that are relevant to my life or exams and have the AI tutor form the basis of a lesson from them, + - **so that** my practice feels directly useful and is adapted to my current level (A1–B2). + + **Acceptance Criteria:** + - Learners can upload files (e.g., work documents, letters from authorities, school forms, pictures from daily life) from web or mobile. + - System detects or uses the learner’s current CEFR level to adapt the conversation difficulty and grammar focus appropriately. + - AI tutor uses the uploaded material as shared context (e.g., refers to specific sections of a document or objects in an image) during the lesson. + - Uploaded content is stored securely, scoped to the learner/account or organization according to configuration and privacy requirements. + + **Dependencies:** Same document ingestion pipeline as instructor authoring, user-facing upload UI, LLM prompts conditioned on user level and uploaded context. + +### 9. Persistent Conversations, Transcripts, and Tutor Greetings + +1. **User Story: Persistent Conversation History and Context Loading** + - **As a** returning learner, + - **I want to** have my previous conversations, transcripts, and progress persisted and used to initialize new lessons, + - **so that** the AI tutor can pick up where we left off and provide a sense of continuity. + + **Acceptance Criteria:** + - Audio from both the user and the AI tutor is always transcribed and stored persistently with each session (subject to retention and privacy policies). + - Each session stores metadata including date, mode (lesson, mock exam, free conversation), level, topics, and key performance indicators. + - The backend exposes an endpoint (e.g., `/sessions/default`) that returns or creates a persistent conversational session containing historical summaries and progress context. + - When a user starts a new lesson, the AI tutor’s context includes a short summary of recent sessions plus key goals and challenges. + + **Dependencies:** Session and transcript storage in PostgreSQL + pgvector, summarization logic in backend LLM services, session management API, LiveKit session orchestration. + +2. **User Story: Contextual Greeting on Login** + - **As a** returning learner, + - **I want to** hear a short spoken greeting from the AI tutor that reminds me where I left off previously, + - **so that** I immediately know what I was working on and can resume with confidence. + + **Acceptance Criteria:** + - After login and reconnecting to the conversational session, the AI tutor greets the user verbally and gives a brief, level-appropriate summary of their most recent activity and suggested next step. + - Greeting content is generated from stored summaries and progress records, not from scratch each time. + - Learners can adjust how much historical detail is included (e.g., “short summary only” vs. “more detailed recap”). + + **Dependencies:** Same as for persistent conversation history; frontend behavior to play greeting early in the session and surface a text version of the summary. + +### 10. Health Checks and Admin Observability + +1. **User Story: Backend Health Check Endpoint** + - **As a** platform operator, + - **I want to** have a simple backend health check endpoint, + - **so that** CI/CD pipelines, uptime monitors, and dashboards can verify that the API is reachable and healthy. + + **Acceptance Criteria:** + - Backend exposes a minimal, authenticated-or-public `GET /health` endpoint returning status information (e.g., service up, database reachable, key dependencies OK). + - Health endpoint is used in deployment pipelines and monitoring (e.g., `https://api./health`). + - Endpoint is lightweight and safe to call frequently. + + **Dependencies:** FastAPI route for health checks, integration with basic internal dependency checks (DB, LiveKit, LLM connectivity where feasible). + +2. **User Story: Admin Health Dashboard in Frontend** + - **As an** admin or operator, + - **I want to** view a dashboard in the frontend showing the health of core components, + - **so that** I can quickly detect and diagnose issues without logging into servers directly. + + **Acceptance Criteria:** + - Frontend provides an admin-only view that aggregates health data for frontend, backend, database, LiveKit, and external LLM APIs. + - Dashboard polls or subscribes to backend health endpoints and visualizes status (e.g., up/down, latency, last check time). + - Critical issues are highlighted and optionally surfaced as alerts/notifications. + + **Dependencies:** Backend health endpoints and metrics, role-based access control, frontend admin UI components. + +## Non-Functional Requirements (Technical) + +### Frontend Requirements + +**Supported Browsers/Devices:** + +- Desktop: Latest 2 versions of Chrome, Firefox, Safari, and Edge. +- Mobile: Latest 2 major versions of iOS and Android (Safari/Chrome), including PWA install support. +- Minimum viewport: responsive layouts down to 360px width. + +**Design/UI:** + +- Voice-first interaction prioritized: microphone and conversation views are obvious and usable with one hand on mobile. +- Consistent brand identity for Avaaz across web and mobile (colors, typography, logo). +- Dark and light modes preferred for accessibility and comfort. +- UI is localized alongside backend error messages; language selection is a first-class setting. +- UI components built in React/Next.js with Tailwind or equivalent utility-first styling, following design system guidelines (buttons, forms, cards, modals). + +### Backend & Database Requirements + +**API Specifications:** + +- RESTful JSON APIs served by a FastAPI backend for core operations: authentication, user management, lessons, sessions, progress, subscriptions. +- Real-time endpoints for voice and agent control: + - LiveKit signaling endpoints (`/sessions/default`, `/sessions/default/token` and equivalents). + - WebSocket or WebRTC connections from backend to LLM realtime APIs (OpenAI Realtime, Gemini Live). +- API documentation exposed via OpenAPI/Swagger (and/or equivalent documentation tooling). +- All APIs versioned (e.g., `/api/v1/...`) with change management. + +**Database Schema (High-Level):** + +- PostgreSQL with pgvector for semantic search and embeddings. +- Core entities include (indicative, not exhaustive): + - `User` (profile, locale, level, subscription plan). + - `Session` (conversation metadata, timestamps, mode, links to transcripts). + - `Lesson` / `Scenario` (curriculum structure, CEFR mapping). + - `ProgressSnapshot` (aggregated metrics per user over time). + - `Subscription` / `Payment` (plan, billing status, Stripe references). + - `Embedding` / `Document` (semantic chunks for search and content retrieval). +- Migrations managed via Alembic, with reproducible dev/prod schemas. + +**Security:** + +- All traffic between client and server encrypted via HTTPS (Caddy as reverse proxy with automatic TLS). +- Authentication via JWT or session tokens implemented with FastAPI Users (or equivalent), with configurable token lifetimes and refresh flows. +- Passwords stored with modern hashing algorithms (e.g., Argon2, bcrypt). +- Role-based access control (e.g., learner, coordinator, admin) for sensitive features (analytics, content management). +- Strict input validation and output encoding following OWASP best practices. +- Secrets stored securely (e.g., environment variables, secret manager), never hard-coded in the repository. +- Rate limiting, abuse detection, and monitoring around critical endpoints. + +### Non-Functional Requirements + +**Performance:** + +- Target median API response time: < 200 ms for standard JSON endpoints under normal load. +- Voice interaction round-trip (user speaks → AI responds) tuned for natural conversation with minimal perceived delay; target < 1.5 seconds for most responses. +- System supports concurrent sessions per LiveKit instance and scales horizontally as needed. +- Efficient use of LLM realtime APIs with streaming responses and graceful handling of network jitter. + +**Reliability & Availability:** + +- Initial production target availability: ≥ 99.5%, with a path to ≥ 99.9% as usage grows. +- Health checks for all containers (frontend, backend, LiveKit, Postgres) integrated with Docker Compose and any orchestration layer, plus the explicit backend `/health` endpoint and frontend admin dashboard described above. +- Graceful degradation: if LLM APIs or LiveKit are temporarily unavailable, the system provides clear messaging to learners and surface-level indicators in the admin dashboard. +- Regular automated backups of PostgreSQL and configuration; tested restore procedures. + +**Scalability:** + +- Docker-based deployment on a production VPS, with clear separation between infra stack (Caddy, Gitea) and app stack (frontend, backend, LiveKit, Postgres). +- Horizontal scaling supported for stateless services (frontend, backend, LiveKit) and vertical scaling for PostgreSQL as needed. +- Efficient connection pooling for database access. +- Architecture designed to move from single VPS to managed services or Kubernetes in future without large rewrites. + +**Technical Specifications:** + +- **Frontend:** Next.js (React, TypeScript), Tailwind or equivalent, PWA enabled; communicates with backend via HTTPS and LiveKit via WebRTC. +- **Backend:** FastAPI (Python), Uvicorn/Gunicorn, Pydantic for validation, structured services for LLM, payments, and documents. +- **Real-time/Media:** LiveKit server for WebRTC signaling and media; integration with LiveKit Agent framework for AI tutor. +- **Database:** PostgreSQL + pgvector; migrations via Alembic. +- **LLM Providers:** OpenAI Realtime API, Google Gemini Live API (WebSocket/WebRTC). +- **Infra:** Caddy reverse proxy, Docker Compose for local and production stacks, Gitea + Actions for CI/CD. +- **Testing/Quality:** Pytest, Hypothesis, httpx for API testing, Ruff and Pyright for linting and static analysis, ESLint for frontend. + +**Accessibility:** + +- Compliance target: WCAG 2.1 AA for web UI. +- All key actions accessible via keyboard and screen readers. +- Sufficient color contrast and scalable font sizes. +- Voice-first design complemented by transcripts and captions; learners can read as well as listen. +- Consideration for hearing- or speech-impaired users where feasible (e.g., text-only practice, adjustable speech rate). + +## Metrics & Release Plan + +**Success Metrics (KPIs):** + +- **Learning Outcomes:** + - ≥ 60% of learners who complete a defined program (e.g., 30+ speaking sessions) report increased speaking confidence. + - ≥ 50% of learners who use Avaaz consistently (e.g., 3+ sessions/week for 8 weeks) pass the B2 oral exam on their first or second attempt. +- **Engagement:** + - Weekly active learners (WAL) growth rate. + - Median speaking minutes per active learner per week. + - Retention (e.g., 4-week and 12-week). +- **Product Quality:** + - Average session rating / NPS for speaking sessions. + - Error rates and crash-free sessions on mobile/web. + - Latency metrics for voice interactions. + +**Timeline & Milestones:** + +- **Phase 1 – Foundation (M0–M2):** + - Implement core architecture (backend, frontend, LiveKit, LLM integrations). + - Basic authentication, user accounts, and minimal speaking session flow. + - Internal alpha with team and close collaborators. +- **Phase 2 – Beta Learning Experience (M3–M4):** + - CEFR-aligned curriculum MVP, immigrant-focused scenarios, post-session summaries. + - Progress dashboard and early gamification (streaks, minutes). + - Invite-only beta with small learner cohorts; collect qualitative and quantitative feedback. +- **Phase 3 – Exam & Scale Readiness (M5–M6):** + - Mock B2 exam mode, robust assessment reports. + - Subscription plans and billing. + - Production hardening (observability, backups, reliability SLOs). + - Public launch in initial target market(s). + +**Release Criteria:** + +- Core features of voice-first lessons, CEFR-aligned curriculum, post-session feedback, and at least one full mock exam template are stable and usable. +- User authentication, subscription management, and payment flows validated in staging and production. +- System meets agreed performance thresholds (latency, error rates) under expected early-production load. +- No open critical security vulnerabilities; penetration testing and reviews completed for auth, payments, and data storage. +- Documentation available for learners (help center) and internal teams (runbooks, API docs). + +**Potential Risks & Assumptions:** + +- **Risks:** + - Dependence on external LLM realtime APIs and their SLAs, pricing, and model changes. + - WebRTC and audio performance may vary across networks and devices, impacting perceived quality. + - Assessment accuracy (CEFR-level estimates) may not initially match human examiner judgments, affecting learner trust. + - Regulatory or data privacy constraints (e.g., storing voice data, cross-border data flows) may impact certain markets. +- **Assumptions:** + - Learners have access to a smartphone or laptop with a microphone and stable-enough internet for audio sessions. + - LLM providers continue to support low-latency realtime APIs suitable for spoken dialogue. + - Target institutions and exam boards accept AI-supported practice tools as preparation, even if they do not formally endorse them. + - Initial go-to-market focuses on a limited set of language pairs (e.g., English → Norwegian Bokmål) with potential expansion later. diff --git a/docs/architecture.md b/docs/architecture.md index 636cbf1..336429f 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -5,7 +5,7 @@ Below is a summary of the **Production VPS** and **Development Laptop** architec ```mermaid flowchart LR %% Client - A(Browser / PWA) + A(Browser) Y(iOS App / Android App) subgraph User @@ -27,7 +27,7 @@ flowchart LR I(Gitea + Actions + Repositories) J(Gitea Runner) - D(Next.js Frontend) + D(React/Next.js/Tailwind Frontend) E(FastAPI Backend + Agent Runtime) G(LiveKit Server) H[(PostgreSQL + pgvector)] @@ -129,7 +129,7 @@ Docker Compose from `./app/docker-compose.yml` is cloned to `/srv/app/docker-com | Container | Description | | ---------- | ----------------------------------------------------------------------------------------- | -| `frontend` | **Next.js Frontend** – SPA/PWA interface served from a Node.js-based Next.js server. | +| `frontend` | **React/Next.js/Tailwind Frontend** – SPA interface served from a Node.js-based Next.js server. | | `backend` | **FastAPI + Uvicorn Backend** – API, auth, business logic, LiveKit orchestration, agent. | | `postgres` | **PostgreSQL + pgvector** – Persistent relational database with vector search. | | `livekit` | **LiveKit Server** – WebRTC signaling plus UDP media for real-time audio and data. | @@ -152,7 +152,7 @@ The `backend` uses several Python packages such as UV, Ruff, FastAPI, FastAPI Us | -------------------- | :---------: | -------------- | -------------------------------- | | **www\.avaaz\.ai** | CNAME | avaaz.ai | Marketing / landing site | | **avaaz.ai** | A | 217.154.51.242 | Root domain | -| **app.avaaz.ai** | A | 217.154.51.242 | Next.js frontend (SPA/PWA) | +| **app.avaaz.ai** | A | 217.154.51.242 | React/Next.js/Tailwind frontend (SPA) | | **api.avaaz.ai** | A | 217.154.51.242 | FastAPI backend | | **rtc.avaaz.ai** | A | 217.154.51.242 | LiveKit signaling + media | | **git.avaaz.ai** | A | 217.154.51.242 | Gitea (HTTPS + SSH) | @@ -448,7 +448,7 @@ The user experiences this as a **continuous, ongoing session** with seamless rec #### App Stack (local Docker) -- `frontend` (Next.js SPA) +- `frontend` (React/Next.js/Tailwind SPA) - `backend` (FastAPI) - `postgres` (PostgreSQL + pgvector) - `livekit` (local LiveKit Server) @@ -465,7 +465,7 @@ No Caddy is deployed locally; the browser talks directly to the mapped container Local development uses: -- `http://localhost:3000` → frontend (Next.js dev/server container) +- `http://localhost:3000` → frontend (React/Next.js/Tailwind dev/server container) - `http://localhost:8000` → backend API (FastAPI) - Example auth/session endpoints: - `POST http://localhost:8000/auth/login` @@ -480,7 +480,7 @@ No `/etc/hosts` changes or TLS certificates are required; `localhost` acts as a | Port | Protocol | Purpose | |-------------:|:--------:|------------------------------------| -| 3000 | TCP | Frontend (Next.js) | +| 3000 | TCP | Frontend (React/Next.js/Tailwind) | | 8000 | TCP | Backend API (FastAPI) | | 5432 | TCP | Postgres + pgvector | | 7880 | TCP | LiveKit HTTP + WS signaling | diff --git a/docs/norwegian/eksempler-pa-oppgaver-i-delproven-i-muntlig-kommunikasjon-niva-a1-a2-bokmal.pdf b/docs/norwegian/eksempler-pa-oppgaver-i-delproven-i-muntlig-kommunikasjon-niva-a1-a2-bokmal.pdf new file mode 100644 index 0000000..4453ea9 Binary files /dev/null and b/docs/norwegian/eksempler-pa-oppgaver-i-delproven-i-muntlig-kommunikasjon-niva-a1-a2-bokmal.pdf differ diff --git a/docs/norwegian/eksempler-pa-oppgaver-i-delproven-i-muntlig-kommunikasjon-niva-a2-b1-bokmal.pdf b/docs/norwegian/eksempler-pa-oppgaver-i-delproven-i-muntlig-kommunikasjon-niva-a2-b1-bokmal.pdf new file mode 100644 index 0000000..10c7540 Binary files /dev/null and b/docs/norwegian/eksempler-pa-oppgaver-i-delproven-i-muntlig-kommunikasjon-niva-a2-b1-bokmal.pdf differ diff --git a/docs/norwegian/eksempler-pa-oppgaver-i-delproven-i-muntlig-kommunikasjon-niva-b1-b2-bokmal.pdf b/docs/norwegian/eksempler-pa-oppgaver-i-delproven-i-muntlig-kommunikasjon-niva-b1-b2-bokmal.pdf new file mode 100644 index 0000000..66e9409 Binary files /dev/null and b/docs/norwegian/eksempler-pa-oppgaver-i-delproven-i-muntlig-kommunikasjon-niva-b1-b2-bokmal.pdf differ diff --git a/docs/norwegian/læreplan i norsk for voksne innvandrere Bokmål.pdf b/docs/norwegian/læreplan i norsk for voksne innvandrere Bokmål.pdf new file mode 100644 index 0000000..87995f5 Binary files /dev/null and b/docs/norwegian/læreplan i norsk for voksne innvandrere Bokmål.pdf differ diff --git a/docs/norwegian/oppgaver til muntlig prøven.md b/docs/norwegian/oppgaver til muntlig prøven.md new file mode 100644 index 0000000..d73212b --- /dev/null +++ b/docs/norwegian/oppgaver til muntlig prøven.md @@ -0,0 +1,59 @@ +# Slik er oppgavene på den muntlige prøven + +## Fortelle kort om deg selv (på A1–A2-prøven) + +* Dette er en individuell oppgave. +* Du skal presentere deg kort, og du velger selv hvilken informasjon du vil ta med. +* Oppgaven er alltid den samme. +* Du snakker i ca. 1-2 minutter. + +## Beskrive et bilde (på A1–A2-prøven) + +* Dette er en individuell oppgave. +* Du får se et bilde med flere personer og ting som skjer, og du skal fortelle om hva du ser på bildet og hva personene på bildet gjør. +* Du kan få oppfølgingsspørsmål fra eksaminator. +* Du snakker i ca. 2-3 minutter. + +## Snakke sammen om et tema (på A1–A2-prøven) + +* Dette er en samtaleoppgave, der to kandidater snakker sammen. +* Dere får et spørsmål fra eksaminator om et tema fra dagliglivet, for eksempel skole, fritid, jobb, familie, mat, vær og så videre. +* Dere kan få oppfølgingsspørsmål fra eksaminator. +* Dere snakker i ca. 2-3 minutter. + +## Fortelle om et tema (på A1–A2 og A2–B1-prøven) + +* Dette er en individuell oppgave. +* Du får et spørsmål fra eksaminator om et tema fra dagliglivet som du skal snakke om, for eksempel skole, fritid, jobb, familie, mat, vær og så videre. +* Du kan få oppfølgingsspørsmål fra eksaminator. +* Du snakker i ca. 2-3 minutter. + +## Snakke sammen om et tema (på A2–B1-prøven) + +* Dette er en samtaleoppgave, der to kandidater snakker sammen. +* Dere får et spørsmål eller en problemstilling som dere skal snakke sammen om. +* Dere kan få oppfølgingsspørsmål fra eksaminator. +* Dere snakker i ca. 5-7 minutter til sammen. + +## Si din mening om et tema og begrunne meningen din (på A2–B1 og B1–B2-prøven) + +* Dette er en individuell oppgave. +* Du får et spørsmål eller en problemstilling, og skal si hva du mener om dette. +* Du skal begrunne meningen din. +* Du kan få oppfølgingsspørsmål fra eksaminator. Du snakker i ca. 2-3 minutter. + +## Utveksle meninger om et tema og begrunne meningene deres (på B1–B2-prøven) + +* Dette er en samtaleoppgave, der to kandidater snakker sammen. +* Dere får et spørsmål eller en problemstilling, som dere skal snakke sammen om og utveksle meninger om. +* Dere skal begrunne meningene deres. +* Eksaminator stiller ikke oppfølgingsspørsmål på denne oppgaven. +* Dere snakker i ca. 5-7 minutter til sammen. + +## Ta stilling til en påstand og begrunne meningene dine (B1–B2-prøven) + +* Dette er en individuell oppgave. +* Du får presentert en påstand, og du skal ta stilling til påstanden og begrunne meningene dine. +* Du får både høre påstanden og se den skriftlig. Så får du tilbud om litt tid til å tenke og notere stikkord. +* I denne oppgaven skal du først snakke selvstendig om påstanden i 2-3 minutter før eksaminator går over til å stille oppfølgingsspørsmål i 2-3 minutter. +* Til sammen snakker du i ca. 4-6 minutter.