Add app scaffold and workflows

2025-12-03 08:58:34 +01:00
parent 5a8b773e40
commit d6b61ae8fb
51 changed files with 10252 additions and 3 deletions
--- a/docs/PRD.md
+++ b/docs/PRD.md
@@ -17,7 +17,7 @@

 Avaaz is a mobile and web application featuring a motivating conversational AI tutor powered by advanced agentic capabilities. It teaches oral language skills through structured, interactive, voice-first lessons that adapt to each student’s pace and performance. Avaaz supports learners at any CEFR level from A1 to B2, with the primary outcome being readiness for the CEFR B2 oral proficiency exam and confident participation in real-life conversations in the destination country.

-Avaaz combines a CEFR-aligned curriculum, real-time AI conversation, and production-grade infrastructure (FastAPI backend, Next.js frontend, LiveKit WebRTC, and PostgreSQL + pgvector) to deliver natural, low-latency speech-to-speech interactions across devices.
+Avaaz combines a CEFR-aligned curriculum with real-time AI conversation to deliver natural, low-latency speech-to-speech interactions across devices.

 **Problem Statement:**  
 Adult immigrants and other language learners struggle to achieve confident speaking ability in their target language, especially at the B2 level required for exams, citizenship, or professional roles. Existing solutions (apps, textbooks, group classes) emphasize passive skills (reading, vocabulary drills, grammar) and offer limited opportunities for high-quality spoken practice with immediate, personalized feedback. Human tutors are expensive, scarce in many regions, and difficult to scale, leaving learners underprepared for real-life conversations and high-stakes oral exams.
@@ -81,7 +81,7 @@ This section describes the core capabilities required for a production-grade Ava
   - Learner can start a voice session from mobile or web with one tap/click.  
   - Audio is streamed via WebRTC with end-to-end latency low enough to support natural turn-taking (target < 250 ms one-way).  
   - AI tutor responds using synthesized voice and on-screen text.  
-   - Both user and tutor audio are transcribed in real time and stored with the session for later review and analytics (subject to retention and privacy policies).  
+   - Transcription and persistence of audio and text follow the persistent conversation and transcript requirements described below.  
   - If the microphone or network fails, the app displays an actionable error and offers a retry or text-only fallback.  

   **Dependencies:** LiveKit server (signaling + media), LLM realtime APIs (OpenAI Realtime, Gemini Live), Caddy reverse proxy, WebRTC-capable browsers/mobile clients, backend session orchestration.
@@ -282,7 +282,7 @@ This section describes the core capabilities required for a production-grade Ava

   **Acceptance Criteria:**  
   - Instructors can upload documents and images (e.g., forms, articles, exam prompts, everyday photos) into the system.  
-   - The backend parses and indexes uploaded material (e.g., via Docling and embeddings) so that AI can reference it during lessons.  
+   - The backend parses and indexes uploaded material via a document processing and embedding pipeline so that AI can reference it during lessons.  
   - Instructors can select target level(s), objectives, and exam formats when creating or editing a lesson.  
   - AI suggests lesson structures, prompts, and example dialogues that instructors can review and modify before publishing.  
   - Lessons are stored with metadata (level, skills, topics, exam parts) and become available in the learner curriculum and mock exams.