First improvement pass

2026-02-10 23:32:57 -05:00
parent f65e3d8413
commit c78a8a90a3
26 changed files with 13200 additions and 207 deletions
@@ -0,0 +1,290 @@
+# keydr - Terminal Typing Tutor Architecture Plan
+
+## Context
+
+**Problem**: No terminal-based typing tutor exists that combines keybr.com's adaptive learning algorithm (gradual letter unlocking, per-key confidence tracking, phonetic pseudo-word generation) with code syntax training. Existing tools either lack adaptive learning entirely (ttyper, smassh, typr) or have incomplete implementations (gokeybr intentionally ignores error stats, ivan-volnov/keybr is focused on Anki integration).
+
+**Goal**: Build a full-featured Rust TUI typing tutor that clones keybr.com's core algorithm, extends it to code syntax training, and provides a polished statistics dashboard - all in the terminal.
+
+---
+
+## Research Summary
+
+### keybr.com Algorithm (from reading source: `packages/keybr-lesson/lib/guided.ts`, `keybr-phonetic-model/lib/phoneticmodel.ts`, `keybr-result/lib/keystats.ts`)
+
+**Letter Unlocking**: Letters sorted by frequency. Starts with minimum 6. New letter unlocked only when ALL included keys have `confidence >= 1.0`. Weakest key (lowest confidence) gets "focused" - drills bias heavily toward it.
+
+**Confidence Model**: `confidence = target_time_ms / filtered_time_to_type`, where `target_time_ms = 60000 / target_speed_cpm` (default target: 175 CPM ~ 35 WPM). `filtered_time_to_type` is an exponential moving average (alpha=0.1) of raw per-key typing times.
+
+**Phonetic Word Generation**: Markov chain transition table maps character bigrams to next-character probability distributions. Chain is walked with a `Filter` that restricts to unlocked characters only. Focused letter gets prefix biasing - the generator searches for chain states containing the focused letter and starts from there. Words are 3-10 chars; space probability boosted by `1.3^word_length` to keep words short.
+
+**Scoring**: `score = (speed_cpm * complexity) / (errors + 1) * (length / 50)`
+
+**Learning Rate**: Polynomial regression (degree 1-3 based on sample count) on last 30 per-key time samples, with R^2 threshold of 0.5 for meaningful predictions.
+
+### Key Insights from Prior Art
+
+- **gokeybr**: Trigram-based scoring with `frequency * effort(speed)` is a good complementary approach. Its Bellman-Ford shortest-path for drill generation is clever but complex.
+- **ttyper**: Clean Rust/Ratatui architecture to reference. Uses `crossterm` events, `State::Test | State::Results` enum, `Config` from TOML. Dependencies: `ratatui ^0.25`, `crossterm ^0.27`, `clap`, `serde`, `toml`, `rand`, `rust-embed`.
+- **keybr-code**: Uses PEG grammars to generate code snippets for 12+ languages. Each grammar produces realistic syntax patterns.
+
+---
+
+## Architecture
+
+### Technology Stack
+- **TUI**: Ratatui + Crossterm (the standard Rust TUI stack, battle-tested by ttyper and many others)
+- **CLI**: Clap (derive)
+- **Serialization**: Serde + serde_json + toml
+- **HTTP**: Reqwest (blocking, for GitHub API)
+- **Persistence**: JSON files via `dirs` crate (XDG paths)
+- **Embedded Assets**: rust-embed
+- **Error Handling**: anyhow + thiserror
+- **Time**: chrono
+
+### Project Structure
+
+```
+src/
+  main.rs                        # CLI parsing, terminal init, main event loop
+  app.rs                         # App state machine (TEA pattern), message dispatch
+  event.rs                       # Crossterm event polling thread -> AppMessage channel
+  config.rs                      # Config loading (~/.config/keydr/config.toml)
+
+  engine/
+    mod.rs
+    letter_unlock.rs             # Letter ordering, unlock logic, focus selection
+    key_stats.rs                 # Per-key EMA, confidence, best-time tracking
+    scoring.rs                   # Lesson score formula, gamification (levels, streaks)
+    learning_rate.rs             # Polynomial regression for speed prediction
+    filter.rs                    # Active character set filter
+
+  generator/
+    mod.rs                       # TextGenerator trait
+    phonetic.rs                  # Markov chain pseudo-word generator
+    transition_table.rs          # Binary transition table (de)serialization
+    code_syntax.rs               # PEG grammar interpreter for code snippets
+    passage.rs                   # Book passage loading
+    github_code.rs               # GitHub API code fetching + caching
+
+  session/
+    mod.rs
+    lesson.rs                    # LessonState: target text, cursor, timing
+    input.rs                     # Keystroke processing, match/mismatch, backspace
+    result.rs                    # LessonResult computation from raw events
+
+  store/
+    mod.rs                       # StorageBackend trait
+    json_store.rs                # JSON file persistence with atomic writes
+    schema.rs                    # Serializable data models
+
+  ui/
+    mod.rs                       # Root render dispatcher
+    theme.rs                     # Theme TOML parsing, color resolution
+    layout.rs                    # Responsive screen layout (ratatui Rect splitting)
+    components/
+      mod.rs
+      typing_area.rs             # Main typing widget (correct/incorrect/pending coloring)
+      stats_sidebar.rs           # Live WPM, accuracy, key confidence bars
+      keyboard_diagram.rs        # Visual keyboard with finger colors + focus highlight
+      progress_bar.rs            # Letter unlock progress
+      chart.rs                   # WPM-over-time line charts (ratatui Chart widget)
+      menu.rs                    # Mode selection menu
+      dashboard.rs               # Post-lesson results view
+      stats_dashboard.rs         # Historical statistics with graphs
+
+  keyboard/
+    mod.rs
+    layout.rs                    # KeyboardLayout, key positions, finger assignments
+    finger.rs                    # Finger enum, hand assignment
+
+assets/
+  models/en.bin                  # Pre-built English phonetic transition table
+  themes/*.toml                  # Built-in themes (catppuccin, dracula, gruvbox, nord, etc.)
+  grammars/*.toml                # Code syntax grammars (rust, python, js, go, etc.)
+  layouts/*.toml                 # Keyboard layouts (qwerty, dvorak, colemak)
+```
+
+### Core Data Flow
+
+```
+                    ┌─────────────┐
+                    │  Event Loop │
+                    └──────┬──────┘
+                           │ AppMessage
+                           ▼
+┌──────────┐     ┌─────────────────┐     ┌───────────┐
+│Generator │────▶│    App State    │────▶│  UI Layer  │
+│(phonetic,│     │  (TEA pattern)  │     │ (ratatui)  │
+│ code,    │     │                 │     │            │
+│ passage) │     │ ┌─────────────┐ │     └───────────┘
+└──────────┘     │ │   Engine    │ │
+                 │ │ (key_stats, │ │     ┌───────────┐
+                 │ │  unlock,    │ │────▶│   Store   │
+                 │ │  scoring)   │ │     │ (JSON)    │
+                 │ └─────────────┘ │     └───────────┘
+                 └─────────────────┘
+```
+
+### App State Machine
+
+```
+Start → Menu
+Menu → Lesson (on mode select)
+Menu → StatsDashboard (on 's')
+Menu → Settings (on 'c')
+Lesson → LessonResult (on completion or ESC)
+LessonResult → Lesson (on 'r' retry)
+LessonResult → Menu (on 'q'/ESC)
+LessonResult → StatsDashboard (on 's')
+StatsDashboard → Menu (on ESC)
+Settings → Menu (on ESC, saves config)
+Any → Quit (on Ctrl+C)
+```
+
+### The Adaptive Algorithm
+
+**Step 1 - Letter Order**: English frequency order: `e t a o i n s h r d l c u m w f g y p b v k j x q z`
+
+**Step 2 - Unlock Logic** (after each lesson):
+```
+min_letters = 6
+for each letter in frequency_order:
+    if included.len() < min_letters:
+        include(letter)
+    elif all included keys have confidence >= 1.0:
+        include(letter)
+    else:
+        break
+```
+
+**Step 3 - Focus Selection**:
+```
+focused = included_keys
+    .filter(|k| k.confidence < 1.0)
+    .min_by(|a, b| a.confidence.cmp(&b.confidence))
+```
+
+**Step 4 - Stats Update** (per key, after each lesson):
+```
+alpha = 0.1
+stat.filtered_time = alpha * new_time + (1 - alpha) * stat.filtered_time
+stat.best_time = min(stat.best_time, stat.filtered_time)
+stat.confidence = (60000.0 / target_speed_cpm) / stat.filtered_time
+```
+
+**Step 5 - Text Generation Biasing**:
+- Only allow characters in the unlocked set (Filter)
+- When a focused letter exists, find Markov chain prefixes containing it and start words from those prefixes
+- This naturally creates words heavy in the weak letter
+
+### Code Syntax Extension
+
+After all 26 prose letters are unlocked, the system transitions to code syntax training:
+- Introduces code-relevant characters: `{ } [ ] ( ) < > ; : . , = + - * / & | ! ? _ " ' # @ \ ~ ^ %`
+- Uses PEG grammars per language to generate realistic code snippets
+- Gradual character unlocking continues for syntax characters
+- Users select their target programming languages in config
+
+### Theme System
+
+Themes are TOML files with semantic color names:
+```toml
+[colors]
+bg = "#1e1e2e"
+text_correct = "#a6e3a1"
+text_incorrect = "#f38ba8"
+text_pending = "#585b70"
+text_cursor_bg = "#f5e0dc"
+focused_key = "#f9e2af"
+# ... etc
+```
+
+Resolution order: CLI flag → config → user themes dir → bundled → default fallback.
+
+Built-in themes: Catppuccin Mocha, Catppuccin Latte, Dracula, Gruvbox Dark, Nord, Tokyo Night, Solarized Dark, One Dark, plus an ANSI-safe default.
+
+### Persistence
+
+JSON files in `~/.local/share/keydr/`:
+- `key_stats.json` - Per-key EMA, confidence, sample history
+- `lesson_history.json` - Last 500 lesson results
+- `profile.json` - Unlock state, settings, gamification data
+
+Atomic writes (temp file → fsync → rename) to prevent corruption. Schema version field for forward-compatible migrations.
+
+---
+
+## Implementation Phases
+
+### Phase 1: Foundation (Core Loop + Basic Typing)
+Create the terminal init/restore with crossterm, event polling thread, TEA-based App state machine, basic typing against a hardcoded word list with correct/incorrect coloring.
+
+**Key files**: `main.rs`, `app.rs`, `event.rs`, `session/lesson.rs`, `session/input.rs`, `ui/components/typing_area.rs`, `ui/layout.rs`
+
+### Phase 2: Adaptive Engine + Statistics
+Implement per-key stats (EMA, confidence), letter unlocking, focus selection, scoring, live stats sidebar, and progress bar.
+
+**Key files**: `engine/key_stats.rs`, `engine/letter_unlock.rs`, `engine/scoring.rs`, `engine/filter.rs`, `session/result.rs`, `ui/components/stats_sidebar.rs`, `ui/components/progress_bar.rs`
+
+### Phase 3: Phonetic Text Generation
+Build the English transition table (offline tool or build script), implement the Markov chain walker with filter and focus biasing, integrate with the lesson system.
+
+**Key files**: `generator/transition_table.rs`, `generator/phonetic.rs`, `generator/mod.rs`, a `build.rs` or `tools/` script for table generation
+
+### Phase 4: Persistence + Theming
+JSON storage backend, atomic writes, config loading, theme parsing, built-in theme files, apply themes throughout all UI components.
+
+**Key files**: `store/json_store.rs`, `store/schema.rs`, `config.rs`, `ui/theme.rs`, `assets/themes/*.toml`
+
+### Phase 5: Results + Dashboard
+Post-lesson results screen, historical stats dashboard with charts (ratatui Chart widget), learning rate prediction.
+
+**Key files**: `ui/components/dashboard.rs`, `ui/components/stats_dashboard.rs`, `ui/components/chart.rs`, `engine/learning_rate.rs`
+
+### Phase 6: Code Practice + Passages
+PEG grammar interpreter for code syntax generation, book passage mode, GitHub code fetching + caching.
+
+**Key files**: `generator/code_syntax.rs`, `generator/passage.rs`, `generator/github_code.rs`, `assets/grammars/*.toml`
+
+### Phase 7: Keyboard Diagram + Layouts
+Visual keyboard widget with finger color coding, multiple layout support (QWERTY, Dvorak, Colemak).
+
+**Key files**: `keyboard/layout.rs`, `keyboard/finger.rs`, `ui/components/keyboard_diagram.rs`, `assets/layouts/*.toml`
+
+### Phase 8: Polish + Gamification
+Level system, streaks, badges, CLI completeness, error handling, performance, testing, documentation.
+
+---
+
+## Verification
+
+After each phase, verify by:
+1. `cargo build` compiles without errors
+2. `cargo test` passes all unit tests
+3. Manual testing: launch `cargo run`, exercise the new features, verify UI rendering
+4. For Phase 2+: verify letter unlocking by typing accurately and watching new letters appear
+5. For Phase 3+: verify generated words only contain unlocked letters and bias toward the focused key
+6. For Phase 4+: verify stats persist across app restarts
+7. For Phase 5+: verify charts render correctly with historical data
+
+---
+
+## Dependencies (Cargo.toml)
+
+```toml
+[dependencies]
+ratatui = "0.30"
+crossterm = "0.28"
+clap = { version = "4.5", features = ["derive"] }
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1.0"
+toml = "0.8"
+rand = { version = "0.8", features = ["small_rng"] }
+reqwest = { version = "0.12", features = ["json", "blocking"] }
+dirs = "6.0"
+rust-embed = "8.5"
+chrono = { version = "0.4", features = ["serde"] }
+anyhow = "1.0"
+thiserror = "2.0"
+```
@@ -0,0 +1,188 @@
+# keydr Improvement Plan
+
+## Context
+
+The app was built in a single first-pass implementation. Six issues need addressing: a broken settings menu, low-contrast pending text, poor phonetic word quality, a bare-bones stats dashboard, an undersized keyboard visualization, and hardcoded passage/code content.
+
+---
+
+## Issue 1: Settings Menu Not Working
+
+**Root cause**: In `main.rs:handle_menu_key`, the `Enter` match handles `0..=3` but Settings is menu item index `4` — it falls through to `_ => {}`. Also no `KeyCode::Char('c')` shortcut handler exists.
+
+**Fix** (`src/main.rs:124-158`):
+- Add `4 => app.screen = AppScreen::Settings` in the Enter match arm
+- Add `KeyCode::Char('c') => app.screen = AppScreen::Settings` handler
+
+**Additionally**, make the Settings screen functional instead of a stub:
+- Make it an interactive form with arrow keys to select fields, Enter to cycle values
+- Fields: Target WPM (adjustable ±5), Theme (cycle through available), Word Count, Code Languages
+- Save config on ESC via existing `Config::save()`
+- New file: no new files needed; extend `render_settings` in `main.rs` and add `handle_settings_key` logic
+- Add `settings_selected: usize` and `settings_editing: bool` fields to `App`
+
+---
+
+## Issue 2: Low Contrast Pending Text
+
+**Root cause**: `text_pending` = `#585b70` (Catppuccin overlay0) on bg `#1e1e2e` is too dim for readable upcoming text.
+
+**Fix**: Change `text_pending` in the default theme and all bundled theme files:
+- `src/ui/theme.rs:92` default: `#585b70` → `#a6adc8` (Catppuccin subtext0, much brighter)
+- Update all 8 theme TOML files in `assets/themes/` with appropriate brighter pending text colors for each theme
+
+---
+
+## Issue 3: Better Phonetic Word Generation
+
+**Root cause**: Our current approach uses a hand-built order-2 trigram table from ~50 English patterns. keybr.com uses:
+1. **Order-4 Markov chain** trained on top 10,000 words from a real frequency dictionary
+2. **Pre-built binary model** (~47KB for English)
+3. **Real word dictionary** — the `naturalWords` mode (keybr's default) primarily uses real English words filtered by unlocked letters, falling back to phonetic pseudo-words only when <15 words match
+
+**Implementation plan**:
+
+### Step A: Build a proper transition table from a word frequency list
+- Create `tools/build_model.rs` (a build-time binary) that:
+  1. Reads an English word frequency list (we'll embed a curated 10K-word list as `assets/wordfreq-en.csv`)
+  2. Uses order-4 chain (matching keybr)
+  3. Appends each word weighted by frequency (like keybr's `builder.append()` loop)
+  4. Outputs binary `.data` file matching keybr's format
+- **OR simpler approach**: Embed the word list directly and build the table at startup (it's fast enough)
+
+### Step B: Upgrade TransitionTable to order-4
+- Modify `TransitionTable` to support variable-order chains
+- Change the key from `(char, char)` → a `Vec<char>` prefix of length `order - 1`
+- Implement `segment(prefix: &[char])` matching keybr's approach
+
+### Step C: Add a word dictionary for "natural words" mode
+- Create `src/generator/dictionary.rs` with a `Dictionary` struct
+- Embed a 10K English word list (JSON or plain text) via rust-embed
+- `Dictionary::find(filter: &CharFilter, focused: Option<char>) -> Vec<&str>` returns real words where all characters are in the allowed set
+- If focused letter exists, prefer words containing it
+
+### Step D: Update PhoneticGenerator to use combined approach (like keybr's GuidedLesson)
+- When `naturalWords` is enabled (default):
+  1. Get real words matching the filter from Dictionary
+  2. If >= 15 real words available, randomly pick from them
+  3. Otherwise, supplement with phonetic pseudo-words from the Markov chain
+- This is what makes keybr's output "feel like real words" — because they mostly ARE real words
+
+**Key files to modify**:
+- `src/generator/transition_table.rs` — upgrade to order-4
+- `src/generator/phonetic.rs` — update word generation loop
+- New: `src/generator/dictionary.rs` — real word dictionary
+- New: `assets/words-en.json` — embedded 10K word list (we can extract from keybr's `clones/keybr.com/packages/keybr-content-words/lib/data/words-en.json`)
+- `src/app.rs` — wire up dictionary
+
+---
+
+## Issue 4: Comprehensive Statistics Dashboard
+
+**Current state**: Single screen with 4 summary numbers and 1 unlabeled WPM line chart.
+
+**Target** (inspired by typr's three-tab layout):
+
+### Tab navigation
+- Add tab state to the stats dashboard: `Dashboard | History | Keystrokes`
+- Keyboard: `D`, `H`, `K` to switch tabs, or `Tab` to cycle
+- Render tabs as a header row with active tab highlighted
+
+### Dashboard Tab
+1. **Summary stats row**: Total lessons, Avg WPM, Best WPM, Avg Accuracy, Total time, Streak
+2. **Progress bars** (3 columns): WPM vs goal, Accuracy vs 100%, Level progress
+3. **WPM over time chart** (line chart, last 50 lessons) — already exists, add axis labels
+4. **Accuracy over time chart** (line chart, last 50 lessons) — new chart
+
+### History Tab
+1. **Recent tests table**: Last 20 lessons with columns: #, WPM, Raw WPM, Accuracy, Time, Date
+2. **Per-key average speed chart**: Bar chart of all 26 letters by avg typing time
+
+### Keystrokes Tab
+1. **Keyboard accuracy heatmap**: Render keyboard layout with per-key accuracy coloring (green=100%, yellow=90-100%, red=<90%)
+2. **Slowest/Fastest keys tables**: Top 5 each with average time in ms
+3. **Word/Character stats**: Total correct/wrong counts
+
+**Key files to modify/create**:
+- `src/ui/components/stats_dashboard.rs` — complete rewrite with tabs
+- `src/ui/components/chart.rs` — add AccuracyChart, BarChart widgets
+- New: `src/ui/components/keyboard_heatmap.rs` — per-key accuracy visualization
+- `src/engine/key_stats.rs` — ensure per-key accuracy tracking exists (not just timing)
+- `src/session/result.rs` — ensure per-key accuracy data is persisted
+- `src/store/schema.rs` — may need to add per-key accuracy to KeyStatsData
+
+---
+
+## Issue 5: Keyboard Visualization Too Small
+
+**Current state**: Keyboard diagram IS rendered in `render_lesson` (`main.rs:330-335`) but given only `Constraint::Length(4)` — with borders that's 2 inner rows, but QWERTY needs 3 rows.
+
+**Fix**:
+- Change keyboard constraint from `Length(4)` to `Length(5)` in `main.rs:316`
+- Improve the keyboard rendering in `keyboard_diagram.rs`:
+  - Use wider keys (5 chars instead of 4) for readability
+  - Add finger-color coding (reuse existing `keyboard/finger.rs`)
+  - Show the next key to type highlighted (pass current target char)
+  - Improve spacing/centering
+
+**Files**: `src/main.rs:311-318`, `src/ui/components/keyboard_diagram.rs`
+
+---
+
+## Issue 6: Embedded + Internet Content (Both Approaches)
+
+### Embedded Baseline (always available, no network)
+- Bundle ~50 passages from public domain literature directly in binary (via rust-embed)
+- Bundle ~100 code snippets per language (Rust, Python, JS, Go) in embedded assets
+- These replace the current ~15 hardcoded passages and ~12 code snippets per language
+
+### Internet Fetching (on top of embedded, with caching)
+
+**Passages: Project Gutenberg**
+- Fetch from `https://www.gutenberg.org/cache/epub/{id}/pg{id}.txt`
+- Curate ~20 popular book IDs (Pride and Prejudice, Alice in Wonderland, etc.)
+- Extract random paragraphs (skip Gutenberg header/footer boilerplate)
+- Cache fetched books to `~/.local/share/keydr/passages/`
+- Gracefully fall back to embedded passages on network failure
+
+**Code: GitHub Raw Files**
+- Fetch raw files from curated popular repos (e.g., `tokio-rs/tokio`, `python/cpython`)
+- Use direct raw.githubusercontent.com URLs for specific files (no API auth needed)
+- Extract function-length snippets (20-50 lines)
+- Cache to `~/.local/share/keydr/code_cache/`
+- Gracefully fall back to embedded snippets on failure
+
+**New dependency**: `reqwest = { version = "0.12", features = ["json", "blocking"] }`
+
+**Files to modify**:
+- `src/generator/passage.rs` — expand embedded + add Gutenberg fetching
+- `src/generator/code_syntax.rs` — expand embedded + add GitHub fetching
+- New: `src/generator/cache.rs` — shared disk caching logic
+- New: `assets/passages/*.txt` — embedded passage files
+- New: `assets/code/*.rs`, `*.py`, etc. — embedded code snippet files
+- `Cargo.toml` — add reqwest dependency
+
+---
+
+## Implementation Order
+
+1. **Issue 1** (Settings menu fix) — quick fix, unblocks testing
+2. **Issue 2** (Text contrast) — quick theme change
+3. **Issue 5** (Keyboard size) — quick layout fix
+4. **Issue 3** (Word generation) — medium complexity, core improvement
+5. **Issue 4** (Stats dashboard) — large UI rewrite
+6. **Issue 6** (Internet content) — medium complexity, requires new dependency
+
+---
+
+## Verification
+
+1. `cargo build` — compiles without errors
+2. `cargo test` — all tests pass
+3. Manual testing for each issue:
+   - Settings: navigate to Settings in menu, change target WPM, verify it saves/loads
+   - Contrast: verify pending text is readable in the typing area
+   - Keyboard: verify all 3 QWERTY rows visible during lesson
+   - Words: start adaptive mode, verify words look like real English
+   - Stats: complete 2-3 lessons, check all three stats tabs render correctly
+   - Passages: start passage mode, verify it fetches new content (with network), and falls back gracefully (without)