N-gram metrics overhaul & UI improvements

2026-02-26 01:26:25 -05:00
parent e7f57dd497
commit 54ddebf054
23 changed files with 3812 additions and 1008 deletions
--- a/docs/plans/2026-02-24-n-grams-statistics-tab.md
+++ b/docs/plans/2026-02-24-n-grams-statistics-tab.md
@@ -0,0 +1,221 @@
+# Plan: N-grams Statistics Tab
+
+## Context
+
+The n-gram error tracking system (last commit `e7f57dd`) tracks bigram/trigram transition difficulties and uses them to adapt drill selection. However, there's no visibility into what the system has identified as weak or how it's influencing drills. This plan adds a **[6] N-grams** tab to the Statistics page to surface this data.
+
+---
+
+## Layout
+
+```
+ [1] Dashboard  [2] History  [3] Activity  [4] Accuracy  [5] Timing  [6] N-grams
+
+ ┌─ Active Focus ──────────────────────────────────────────────────────────────┐
+ │  Focus: Bigram "th"  (difficulty: 1.24)                                    │
+ │  Bigram diff 1.24 > char 'n' diff 0.50 x 0.8 threshold                    │
+ └─────────────────────────────────────────────────────────────────────────────┘
+ ┌─ Eligible Bigrams (3) ────────────────┐┌─ Watchlist ─────────────────────────┐
+ │ Pair  Diff  Err%  Exp%  Red  Conf  N  ││ Pair  Red   Samples   Streak       │
+ │  th   1.24  18%    7%  2.10  0.41  32 ││  er   1.82   14/20     2/3         │
+ │  ed   0.89  22%    9%  1.90  0.53  28 ││  in   1.61    8/20     1/3         │
+ │  ng   0.72  14%    8%  1.72  0.58  24 ││  ou   1.53   18/20     1/3         │
+ └────────────────────────────────────────┘└───────────────────────────────────┘
+ Scope: Global | Bigrams: 142 | Trigrams: 387 | Hesitation: >832ms | Tri-gain: 12.0%
+
+ [ESC] Back  [Tab] Next tab  [1-6] Switch tab
+```
+
+---
+
+## Scope Decisions
+
+- **Drill scope**: Tab shows data for `app.drill_scope` (current adaptive scope). A scope label in the summary line makes this explicit (e.g., "Scope: Global" or "Scope: Branch: lowercase").
+- **Trigram gain**: Sourced from `app.trigram_gain_history` (computed every 50 ranked drills). Always from ranked stats, consistent with bigram/trigram counts shown. The value is a fraction in `[0.0, 1.0]` (count of signal trigrams / total qualified trigrams), so it is mathematically non-negative. Format: `X.X%` (one decimal). When empty: `--` with note "(computed every 50 drills)".
+- **Eligible vs Watchlist**: Strictly disjoint by construction. Watchlist filter explicitly excludes bigrams that pass all eligibility gates.
+
+---
+
+## Layer Boundaries
+
+Domain logic (engine) and presentation (UI) are separated:
+
+- **Engine** (`ngram_stats.rs`): Owns `FocusReasoning` (domain decision explanation), `select_focus_target_with_reasoning()`, filtering/gating/sorting logic for eligible and watchlist bigrams. Returns domain-oriented results.
+- **UI** (`stats_dashboard.rs`): Owns `NgramTabData`, `EligibleBigramRow`, `WatchlistBigramRow` (view model structs tailored for rendering columns).
+- **Adapter** (`main.rs`): `build_ngram_tab_data()` is the single point that translates engine output → UI view models. All stats store lookups for display columns happen here.
+
+---
+
+## Files to Modify
+
+### 1. `src/engine/ngram_stats.rs` — Domain logic + focus reasoning
+
+**`FocusReasoning` enum** (domain concept — why the target was selected):
+```rust
+pub enum FocusReasoning {
+    BigramWins {
+        bigram_difficulty: f64,
+        char_difficulty: f64,
+        char_key: Option<char>,  // None when no focused char exists
+    },
+    CharWins {
+        char_key: char,
+        char_difficulty: f64,
+        bigram_best: Option<(BigramKey, f64)>,
+    },
+    NoBigrams { char_key: char },
+    Fallback,
+}
+```
+
+**`select_focus_target_with_reasoning()`** — Unified function returning `(FocusTarget, FocusReasoning)`. Internally calls `focused_key()` and `weakest_bigram()` once. Handles all four match arms without synthetic values.
+
+**`focus_eligible_bigrams()`** on `BigramStatsStore` — Returns `Vec<(BigramKey, f64 /*difficulty*/, f64 /*redundancy*/)>` sorted by `(difficulty desc, redundancy desc, key lexical asc)`. Same gating as `weakest_bigram()`: sample >= `MIN_SAMPLES_FOR_FOCUS`, streak >= `STABILITY_STREAK_REQUIRED`, redundancy > `STABILITY_THRESHOLD`, difficulty > 0. Returns ALL qualifying entries (no truncation — UI handles truncation to available height).
+
+**`watchlist_bigrams()`** on `BigramStatsStore` — Returns `Vec<(BigramKey, f64 /*redundancy*/)>` sorted by `(redundancy desc, key lexical asc)`. Criteria: redundancy > `STABILITY_THRESHOLD`, sample_count >= 3 (noise floor), AND NOT fully eligible. Returns ALL qualifying entries.
+
+**Export constants** — Make `MIN_SAMPLES_FOR_FOCUS` and `STABILITY_STREAK_REQUIRED` `pub(crate)` so the adapter in `main.rs` can pass them into `NgramTabData` without duplicating values.
+
+### 2. `src/ui/components/stats_dashboard.rs` — View models + rendering
+
+**View model structs** (presentation-oriented, mapped from engine data by adapter):
+
+```rust
+pub struct EligibleBigramRow {
+    pub pair: String,          // e.g., "th"
+    pub difficulty: f64,
+    pub error_rate_pct: f64,   // smoothed, as percentage
+    pub expected_rate_pct: f64,// from char independence, as percentage
+    pub redundancy: f64,
+    pub confidence: f64,
+    pub sample_count: usize,
+}
+
+pub struct WatchlistBigramRow {
+    pub pair: String,
+    pub redundancy: f64,
+    pub sample_count: usize,
+    pub redundancy_streak: u8,
+}
+```
+
+**`NgramTabData` struct** (assembled by `build_ngram_tab_data()` in main.rs):
+```rust
+pub struct NgramTabData {
+    pub focus_target: FocusTarget,
+    pub focus_reasoning: FocusReasoning,
+    pub eligible: Vec<EligibleBigramRow>,
+    pub watchlist: Vec<WatchlistBigramRow>,
+    pub total_bigrams: usize,
+    pub total_trigrams: usize,
+    pub hesitation_threshold_ms: f64,
+    pub latest_trigram_gain: Option<f64>,
+    pub scope_label: String,
+    // Engine thresholds for watchlist progress denominators:
+    pub min_samples_for_focus: usize,     // from ngram_stats::MIN_SAMPLES_FOR_FOCUS
+    pub stability_streak_required: u8,    // from ngram_stats::STABILITY_STREAK_REQUIRED
+}
+```
+
+**Add field** to `StatsDashboard`: `ngram_data: Option<&'a NgramTabData>`
+
+**Update constructor**, tab header (add `"[6] N-grams"`), footer (`[1-6]`), `render_tab()` dispatch.
+
+**Rendering methods:**
+
+- **`render_ngram_tab()`** — Vertical layout: focus (4 lines), lists (Min 5), summary (2 lines).
+
+- **`render_ngram_focus()`** — Bordered "Active Focus" block.
+  - Line 1: target name in `colors.focused_key()` + bold
+  - Line 2: reasoning in `colors.text_pending()`
+  - When BigramWins + char_key is None: "Bigram selected (no individual char weakness found)"
+  - Empty state: "Complete some adaptive drills to see focus data"
+
+- **`render_eligible_bigrams()`** — Bordered "Eligible Bigrams (N)" block.
+  - Header in `colors.accent()` + bold
+  - Rows colored by difficulty: `error()` (>1.0), `warning()` (>0.5), `success()` (<=0.5)
+  - Columns: `Pair  Diff  Err%  Exp%  Red  Conf  N`
+  - Narrow (<38 inner): drop Exp% and Conf
+  - Truncate rows to available height
+  - Empty state: "No bigrams meet focus criteria yet"
+
+- **`render_watchlist_bigrams()`** — Bordered "Watchlist" block.
+  - Columns: `Pair  Red  Samples  Streak`
+  - Samples rendered as `n/{data.min_samples_for_focus}`, Streak as `n/{data.stability_streak_required}` — denominators sourced from `NgramTabData` (engine constants), never hardcoded in UI
+  - All rows in `colors.warning()`
+  - Truncate rows to available height
+  - Empty state: "No approaching bigrams"
+
+- **`render_ngram_summary()`** — Single line: scope label, bigram/trigram counts, hesitation threshold, trigram gain.
+
+### 3. `src/main.rs` — Input handling + adapter
+
+**`handle_stats_key()`**:
+- `STATS_TAB_COUNT`: 5 → 6
+- Add `KeyCode::Char('6') => app.stats_tab = 5` in both branches
+
+**`build_ngram_tab_data(app: &App) -> NgramTabData`** — Dedicated adapter function (single point of engine→UI translation):
+- Calls `select_focus_target_with_reasoning()`
+- Calls `focus_eligible_bigrams()` and `watchlist_bigrams()`
+- Maps engine results to `EligibleBigramRow`/`WatchlistBigramRow` by looking up additional per-bigram stats (error rate, expected rate, confidence, streak) from `app.ranked_bigram_stats` and `app.ranked_key_stats`
+- Builds scope label from `app.drill_scope`
+- Only called when `app.stats_tab == 5`
+
+**`render_stats()`**: Call `build_ngram_tab_data()` when on tab 5, pass `Some(&data)` to StatsDashboard.
+
+---
+
+## Implementation Order
+
+1. Add `FocusReasoning` enum and `select_focus_target_with_reasoning()` to `ngram_stats.rs`
+2. Add `focus_eligible_bigrams()` and `watchlist_bigrams()` to `BigramStatsStore`
+3. Add unit tests for steps 1-2
+4. Add view model structs (`EligibleBigramRow`, `WatchlistBigramRow`, `NgramTabData`) and `ngram_data` field to `stats_dashboard.rs`
+5. Add all rendering methods to `stats_dashboard.rs`
+6. Update tab header, footer, `render_tab()` dispatch in `stats_dashboard.rs`
+7. Add `build_ngram_tab_data()` adapter + update `render_stats()` in `main.rs`
+8. Update `handle_stats_key()` in `main.rs`
+
+---
+
+## Verification
+
+### Unit Tests (in `ngram_stats.rs` test module)
+
+**`test_focus_eligible_bigrams_gating`** — BigramStatsStore with bigrams at boundary conditions:
+- sample=25, streak=3, redundancy=2.0 → eligible
+- sample=15, streak=3, redundancy=2.0 → excluded (samples < 20)
+- sample=25, streak=2, redundancy=2.0 → excluded (streak < 3)
+- sample=25, streak=3, redundancy=1.2 → excluded (redundancy <= 1.5)
+- sample=25, streak=3, redundancy=2.0, confidence=1.5 → excluded (difficulty <= 0)
+
+**`test_focus_eligible_bigrams_ordering_and_tiebreak`** — 3 eligible bigrams: two with same difficulty but different redundancy, one with lower difficulty. Verify sorted by (difficulty desc, redundancy desc, key lexical asc).
+
+**`test_watchlist_bigrams_gating`** — Bigrams at boundary:
+- Fully eligible (sample=25, streak=3) → excluded (goes to eligible list)
+- High redundancy, low samples (sample=10) → included
+- High redundancy, low streak (sample=25, streak=1) → included
+- Low redundancy (1.3) → excluded
+- Very few samples (sample=2) → excluded (< 3 noise floor)
+
+**`test_watchlist_bigrams_ordering_and_tiebreak`** — 3 watchlist entries: two with same redundancy. Verify sorted by (redundancy desc, key lexical asc).
+
+**`test_select_focus_with_reasoning_bigram_wins`** — Bigram difficulty > char difficulty * 0.8. Returns `BigramWins` with correct values and `char_key: Some(ch)`.
+
+**`test_select_focus_with_reasoning_char_wins`** — Char difficulty high, bigram < threshold. Returns `CharWins` with `bigram_best` populated.
+
+**`test_select_focus_with_reasoning_no_bigrams`** — No eligible bigrams. Returns `NoBigrams`.
+
+**`test_select_focus_with_reasoning_bigram_only`** — No focused char, bigram exists. Returns `BigramWins` with `char_key: None`.
+
+### Build & Existing Tests
+- `cargo build` — no compile errors
+- `cargo test` — all existing + new tests pass
+
+### Manual Testing
+- Navigate to Statistics → press [6] → see N-grams tab
+- Tab/BackTab cycles through all 6 tabs
+- With no drill history: empty states shown for all panels
+- After several adaptive drills: eligible bigrams appear with plausible data
+- Scope label reflects current drill scope
+- Verify layout at 80x24 terminal size — confirm column drop at narrow widths keeps header/data aligned