N-gram metrics overhaul & UI improvements

This commit is contained in:
2026-02-26 01:26:25 -05:00
parent e7f57dd497
commit 54ddebf054
23 changed files with 3812 additions and 1008 deletions

View File

@@ -0,0 +1,221 @@
# Plan: N-grams Statistics Tab
## Context
The n-gram error tracking system (last commit `e7f57dd`) tracks bigram/trigram transition difficulties and uses them to adapt drill selection. However, there's no visibility into what the system has identified as weak or how it's influencing drills. This plan adds a **[6] N-grams** tab to the Statistics page to surface this data.
---
## Layout
```
[1] Dashboard [2] History [3] Activity [4] Accuracy [5] Timing [6] N-grams
┌─ Active Focus ──────────────────────────────────────────────────────────────┐
│ Focus: Bigram "th" (difficulty: 1.24) │
│ Bigram diff 1.24 > char 'n' diff 0.50 x 0.8 threshold │
└─────────────────────────────────────────────────────────────────────────────┘
┌─ Eligible Bigrams (3) ────────────────┐┌─ Watchlist ─────────────────────────┐
│ Pair Diff Err% Exp% Red Conf N ││ Pair Red Samples Streak │
│ th 1.24 18% 7% 2.10 0.41 32 ││ er 1.82 14/20 2/3 │
│ ed 0.89 22% 9% 1.90 0.53 28 ││ in 1.61 8/20 1/3 │
│ ng 0.72 14% 8% 1.72 0.58 24 ││ ou 1.53 18/20 1/3 │
└────────────────────────────────────────┘└───────────────────────────────────┘
Scope: Global | Bigrams: 142 | Trigrams: 387 | Hesitation: >832ms | Tri-gain: 12.0%
[ESC] Back [Tab] Next tab [1-6] Switch tab
```
---
## Scope Decisions
- **Drill scope**: Tab shows data for `app.drill_scope` (current adaptive scope). A scope label in the summary line makes this explicit (e.g., "Scope: Global" or "Scope: Branch: lowercase").
- **Trigram gain**: Sourced from `app.trigram_gain_history` (computed every 50 ranked drills). Always from ranked stats, consistent with bigram/trigram counts shown. The value is a fraction in `[0.0, 1.0]` (count of signal trigrams / total qualified trigrams), so it is mathematically non-negative. Format: `X.X%` (one decimal). When empty: `--` with note "(computed every 50 drills)".
- **Eligible vs Watchlist**: Strictly disjoint by construction. Watchlist filter explicitly excludes bigrams that pass all eligibility gates.
---
## Layer Boundaries
Domain logic (engine) and presentation (UI) are separated:
- **Engine** (`ngram_stats.rs`): Owns `FocusReasoning` (domain decision explanation), `select_focus_target_with_reasoning()`, filtering/gating/sorting logic for eligible and watchlist bigrams. Returns domain-oriented results.
- **UI** (`stats_dashboard.rs`): Owns `NgramTabData`, `EligibleBigramRow`, `WatchlistBigramRow` (view model structs tailored for rendering columns).
- **Adapter** (`main.rs`): `build_ngram_tab_data()` is the single point that translates engine output → UI view models. All stats store lookups for display columns happen here.
---
## Files to Modify
### 1. `src/engine/ngram_stats.rs` — Domain logic + focus reasoning
**`FocusReasoning` enum** (domain concept — why the target was selected):
```rust
pub enum FocusReasoning {
BigramWins {
bigram_difficulty: f64,
char_difficulty: f64,
char_key: Option<char>, // None when no focused char exists
},
CharWins {
char_key: char,
char_difficulty: f64,
bigram_best: Option<(BigramKey, f64)>,
},
NoBigrams { char_key: char },
Fallback,
}
```
**`select_focus_target_with_reasoning()`** — Unified function returning `(FocusTarget, FocusReasoning)`. Internally calls `focused_key()` and `weakest_bigram()` once. Handles all four match arms without synthetic values.
**`focus_eligible_bigrams()`** on `BigramStatsStore` — Returns `Vec<(BigramKey, f64 /*difficulty*/, f64 /*redundancy*/)>` sorted by `(difficulty desc, redundancy desc, key lexical asc)`. Same gating as `weakest_bigram()`: sample >= `MIN_SAMPLES_FOR_FOCUS`, streak >= `STABILITY_STREAK_REQUIRED`, redundancy > `STABILITY_THRESHOLD`, difficulty > 0. Returns ALL qualifying entries (no truncation — UI handles truncation to available height).
**`watchlist_bigrams()`** on `BigramStatsStore` — Returns `Vec<(BigramKey, f64 /*redundancy*/)>` sorted by `(redundancy desc, key lexical asc)`. Criteria: redundancy > `STABILITY_THRESHOLD`, sample_count >= 3 (noise floor), AND NOT fully eligible. Returns ALL qualifying entries.
**Export constants** — Make `MIN_SAMPLES_FOR_FOCUS` and `STABILITY_STREAK_REQUIRED` `pub(crate)` so the adapter in `main.rs` can pass them into `NgramTabData` without duplicating values.
### 2. `src/ui/components/stats_dashboard.rs` — View models + rendering
**View model structs** (presentation-oriented, mapped from engine data by adapter):
```rust
pub struct EligibleBigramRow {
pub pair: String, // e.g., "th"
pub difficulty: f64,
pub error_rate_pct: f64, // smoothed, as percentage
pub expected_rate_pct: f64,// from char independence, as percentage
pub redundancy: f64,
pub confidence: f64,
pub sample_count: usize,
}
pub struct WatchlistBigramRow {
pub pair: String,
pub redundancy: f64,
pub sample_count: usize,
pub redundancy_streak: u8,
}
```
**`NgramTabData` struct** (assembled by `build_ngram_tab_data()` in main.rs):
```rust
pub struct NgramTabData {
pub focus_target: FocusTarget,
pub focus_reasoning: FocusReasoning,
pub eligible: Vec<EligibleBigramRow>,
pub watchlist: Vec<WatchlistBigramRow>,
pub total_bigrams: usize,
pub total_trigrams: usize,
pub hesitation_threshold_ms: f64,
pub latest_trigram_gain: Option<f64>,
pub scope_label: String,
// Engine thresholds for watchlist progress denominators:
pub min_samples_for_focus: usize, // from ngram_stats::MIN_SAMPLES_FOR_FOCUS
pub stability_streak_required: u8, // from ngram_stats::STABILITY_STREAK_REQUIRED
}
```
**Add field** to `StatsDashboard`: `ngram_data: Option<&'a NgramTabData>`
**Update constructor**, tab header (add `"[6] N-grams"`), footer (`[1-6]`), `render_tab()` dispatch.
**Rendering methods:**
- **`render_ngram_tab()`** — Vertical layout: focus (4 lines), lists (Min 5), summary (2 lines).
- **`render_ngram_focus()`** — Bordered "Active Focus" block.
- Line 1: target name in `colors.focused_key()` + bold
- Line 2: reasoning in `colors.text_pending()`
- When BigramWins + char_key is None: "Bigram selected (no individual char weakness found)"
- Empty state: "Complete some adaptive drills to see focus data"
- **`render_eligible_bigrams()`** — Bordered "Eligible Bigrams (N)" block.
- Header in `colors.accent()` + bold
- Rows colored by difficulty: `error()` (>1.0), `warning()` (>0.5), `success()` (<=0.5)
- Columns: `Pair Diff Err% Exp% Red Conf N`
- Narrow (<38 inner): drop Exp% and Conf
- Truncate rows to available height
- Empty state: "No bigrams meet focus criteria yet"
- **`render_watchlist_bigrams()`** — Bordered "Watchlist" block.
- Columns: `Pair Red Samples Streak`
- Samples rendered as `n/{data.min_samples_for_focus}`, Streak as `n/{data.stability_streak_required}` — denominators sourced from `NgramTabData` (engine constants), never hardcoded in UI
- All rows in `colors.warning()`
- Truncate rows to available height
- Empty state: "No approaching bigrams"
- **`render_ngram_summary()`** — Single line: scope label, bigram/trigram counts, hesitation threshold, trigram gain.
### 3. `src/main.rs` — Input handling + adapter
**`handle_stats_key()`**:
- `STATS_TAB_COUNT`: 5 → 6
- Add `KeyCode::Char('6') => app.stats_tab = 5` in both branches
**`build_ngram_tab_data(app: &App) -> NgramTabData`** — Dedicated adapter function (single point of engine→UI translation):
- Calls `select_focus_target_with_reasoning()`
- Calls `focus_eligible_bigrams()` and `watchlist_bigrams()`
- Maps engine results to `EligibleBigramRow`/`WatchlistBigramRow` by looking up additional per-bigram stats (error rate, expected rate, confidence, streak) from `app.ranked_bigram_stats` and `app.ranked_key_stats`
- Builds scope label from `app.drill_scope`
- Only called when `app.stats_tab == 5`
**`render_stats()`**: Call `build_ngram_tab_data()` when on tab 5, pass `Some(&data)` to StatsDashboard.
---
## Implementation Order
1. Add `FocusReasoning` enum and `select_focus_target_with_reasoning()` to `ngram_stats.rs`
2. Add `focus_eligible_bigrams()` and `watchlist_bigrams()` to `BigramStatsStore`
3. Add unit tests for steps 1-2
4. Add view model structs (`EligibleBigramRow`, `WatchlistBigramRow`, `NgramTabData`) and `ngram_data` field to `stats_dashboard.rs`
5. Add all rendering methods to `stats_dashboard.rs`
6. Update tab header, footer, `render_tab()` dispatch in `stats_dashboard.rs`
7. Add `build_ngram_tab_data()` adapter + update `render_stats()` in `main.rs`
8. Update `handle_stats_key()` in `main.rs`
---
## Verification
### Unit Tests (in `ngram_stats.rs` test module)
**`test_focus_eligible_bigrams_gating`** — BigramStatsStore with bigrams at boundary conditions:
- sample=25, streak=3, redundancy=2.0 → eligible
- sample=15, streak=3, redundancy=2.0 → excluded (samples < 20)
- sample=25, streak=2, redundancy=2.0 → excluded (streak < 3)
- sample=25, streak=3, redundancy=1.2 → excluded (redundancy <= 1.5)
- sample=25, streak=3, redundancy=2.0, confidence=1.5 → excluded (difficulty <= 0)
**`test_focus_eligible_bigrams_ordering_and_tiebreak`** — 3 eligible bigrams: two with same difficulty but different redundancy, one with lower difficulty. Verify sorted by (difficulty desc, redundancy desc, key lexical asc).
**`test_watchlist_bigrams_gating`** — Bigrams at boundary:
- Fully eligible (sample=25, streak=3) → excluded (goes to eligible list)
- High redundancy, low samples (sample=10) → included
- High redundancy, low streak (sample=25, streak=1) → included
- Low redundancy (1.3) → excluded
- Very few samples (sample=2) → excluded (< 3 noise floor)
**`test_watchlist_bigrams_ordering_and_tiebreak`** — 3 watchlist entries: two with same redundancy. Verify sorted by (redundancy desc, key lexical asc).
**`test_select_focus_with_reasoning_bigram_wins`** — Bigram difficulty > char difficulty * 0.8. Returns `BigramWins` with correct values and `char_key: Some(ch)`.
**`test_select_focus_with_reasoning_char_wins`** — Char difficulty high, bigram < threshold. Returns `CharWins` with `bigram_best` populated.
**`test_select_focus_with_reasoning_no_bigrams`** — No eligible bigrams. Returns `NoBigrams`.
**`test_select_focus_with_reasoning_bigram_only`** — No focused char, bigram exists. Returns `BigramWins` with `char_key: None`.
### Build & Existing Tests
- `cargo build` — no compile errors
- `cargo test` — all existing + new tests pass
### Manual Testing
- Navigate to Statistics → press [6] → see N-grams tab
- Tab/BackTab cycles through all 6 tabs
- With no drill history: empty states shown for all panels
- After several adaptive drills: eligible bigrams appear with plausible data
- Scope label reflects current drill scope
- Verify layout at 80x24 terminal size — confirm column drop at narrow widths keeps header/data aligned

View File

@@ -0,0 +1,265 @@
# Plan: Bigram Metrics Overhaul — Error Anomaly & Speed Anomaly
## Context
The current bigram metrics use `difficulty = (1 - confidence) * redundancy` to gate eligibility and focus. This is fundamentally broken: when a user types faster than target WPM (`confidence > 1.0`), difficulty goes negative — even for bigrams with 100% error rate. The root cause is that "confidence" (a speed-vs-target ratio) and "redundancy" (an error-rate ratio) are conflated into a single metric that can cancel out genuine problems.
This overhaul replaces the conflated system with two orthogonal anomaly metrics:
- **`error_anomaly`** — how much worse a bigram's error rate is compared to what's expected from its constituent characters (same math as current `redundancy_score`, reframed as a percentage)
- **`speed_anomaly`** — how much slower a bigram transition is compared to the user's normal speed typing the second character (user-relative, no target WPM dependency)
Both are displayed as percentages where positive = worse than expected. The UI shows two side-by-side columns, one per anomaly type, with confirmed problems highlighted.
---
## Persistence / Migration
**NgramStat is NOT persisted to disk.** N-gram stores are rebuilt from drill history on every startup (see `json_store.rs:104` comment: "N-gram stats are not included — they are always rebuilt from drill history", and `app.rs:1152` `rebuild_ngram_stats()`). The stores are never saved via `save_data()` — only `profile`, `key_stats`, `ranked_key_stats`, and `drill_history` are persisted.
Therefore:
- No serde migration, `#[serde(alias)]`, or backward-compat handling is needed for NgramStat field renames/removals
- `#[serde(default)]` annotations on NgramStat fields are vestigial (the derive exists for in-memory cloning, not disk persistence) but harmless to leave
- The `Serialize`/`Deserialize` derives on NgramStat can stay (used by BigramStatsStore/TrigramStatsStore types which derive them transitively, though the stores themselves are also not persisted)
**KeyStat IS persisted**`confidence` on KeyStat is NOT being changed (used by skill_tree progression). No migration needed there.
---
## Changes
### 1. `src/engine/ngram_stats.rs` — Metrics engine overhaul
**NgramStat struct** (line 34):
- Remove `confidence: f64` field
- Rename `redundancy_streak: u8``error_anomaly_streak: u8`
- Add `speed_anomaly_streak: u8` with `#[serde(default)]`
- **Preserved fields** (explicitly unchanged): `filtered_time_ms`, `best_time_ms`, `sample_count`, `error_count`, `hesitation_count`, `recent_times`, `recent_correct`, `last_seen_drill_index` — all remain and continue to be updated by `update_stat()`
**`update_stat()`** (line 65):
- Remove `confidence = target_time_ms / stat.filtered_time_ms` computation (line 82)
- Remove `target_time_ms` parameter (no longer needed)
- **Keep** `hesitation` parameter and `drill_index` parameter — these update `hesitation_count` (line 72) and `last_seen_drill_index` (line 66) which are used by trigram pruning and other downstream logic
- New signature (module-private, matching current visibility): `fn update_stat(stat: &mut NgramStat, time_ms: f64, correct: bool, hesitation: bool, drill_index: u32)`
- All other field updates remain identical (EMA on filtered_time_ms, best_time_ms, recent_times, recent_correct, error_count, sample_count)
**Constants** (lines 10-16):
- Rename `STABILITY_THRESHOLD``ERROR_ANOMALY_RATIO_THRESHOLD` (value stays 1.5)
- Rename `STABILITY_STREAK_REQUIRED``ANOMALY_STREAK_REQUIRED` (value stays 3)
- Rename `WATCHLIST_MIN_SAMPLES``ANOMALY_MIN_SAMPLES` (value stays 3)
- Add `SPEED_ANOMALY_PCT_THRESHOLD: f64 = 50.0` (50% slower than expected)
- Add `MIN_CHAR_SAMPLES_FOR_SPEED: usize = 10` (EMA alpha=0.1 needs ~10 samples for initial value to decay to ~35% influence; 5 samples still has ~59% initial-value bias, too noisy for baseline)
- Remove `DEFAULT_TARGET_CPM` (no longer used by update_stat or stores)
**`BigramStatsStore` struct** (line 102):
- Remove `target_cpm: f64` field and `default_target_cpm()` helper
- `BigramStatsStore::update()` (line 114): Remove `target_time_ms` calculation. Pass-through to `update_stat()` without it.
**`TrigramStatsStore` struct** (line 285):
- Remove `target_cpm: f64` field
- `TrigramStatsStore::update()` (line 293): Remove `target_time_ms` calculation. Pass-through to `update_stat()` without it.
**Remove `get_confidence()`** methods on both stores (lines 121, 300) — they read the deleted `confidence` field. Both are `#[allow(dead_code)]` already.
**Rename `redundancy_score()`****`error_anomaly_ratio()`** (line 132):
- Same math internally, just renamed. Returns `e_ab / expected_ab`.
**New methods on `BigramStatsStore`**:
```rust
/// Error anomaly as percentage: (ratio - 1.0) * 100
/// Returns None if bigram has no stats.
pub fn error_anomaly_pct(&self, key: &BigramKey, char_stats: &KeyStatsStore) -> Option<f64> {
let _stat = self.stats.get(key)?;
let ratio = self.error_anomaly_ratio(key, char_stats);
Some((ratio - 1.0) * 100.0)
}
/// Speed anomaly: % slower than user types char_b in isolation.
/// Compares bigram filtered_time_ms to char_b's filtered_time_ms.
/// Returns None if bigram has no stats or char_b has < MIN_CHAR_SAMPLES_FOR_SPEED samples.
pub fn speed_anomaly_pct(&self, key: &BigramKey, char_stats: &KeyStatsStore) -> Option<f64> {
let stat = self.stats.get(key)?;
let char_b_stat = char_stats.stats.get(&key.0[1])?;
if char_b_stat.sample_count < MIN_CHAR_SAMPLES_FOR_SPEED { return None; }
let ratio = stat.filtered_time_ms / char_b_stat.filtered_time_ms;
Some((ratio - 1.0) * 100.0)
}
```
**Rename `update_redundancy_streak()`****`update_error_anomaly_streak()`** (line 142):
- Same logic, uses renamed constant and renamed field
**New `update_speed_anomaly_streak()`**:
- Same pattern as error streak: call `speed_anomaly_pct()`, compare against `SPEED_ANOMALY_PCT_THRESHOLD`
- If `speed_anomaly_pct()` returns `None` (char baseline unavailable/under-sampled), **hold previous streak value** — don't reset or increment. The bigram simply can't be evaluated for speed yet.
- Requires both bigram samples >= `ANOMALY_MIN_SAMPLES` AND char_b samples >= `MIN_CHAR_SAMPLES_FOR_SPEED` before any streak update occurs.
**New `BigramAnomaly` struct**:
```rust
pub struct BigramAnomaly {
pub key: BigramKey,
pub anomaly_pct: f64,
pub sample_count: usize,
pub streak: u8,
pub confirmed: bool, // streak >= ANOMALY_STREAK_REQUIRED && samples >= MIN_SAMPLES_FOR_FOCUS
}
```
**Replace `focus_eligible_bigrams()` + `watchlist_bigrams()`** with:
- **`error_anomaly_bigrams(&self, char_stats: &KeyStatsStore, unlocked: &[char]) -> Vec<BigramAnomaly>`** — All bigrams with `error_anomaly_ratio > ERROR_ANOMALY_RATIO_THRESHOLD` and `samples >= ANOMALY_MIN_SAMPLES`, sorted by anomaly_pct desc. Each entry's `confirmed` flag = `error_anomaly_streak >= ANOMALY_STREAK_REQUIRED && samples >= MIN_SAMPLES_FOR_FOCUS`.
- **`speed_anomaly_bigrams(&self, char_stats: &KeyStatsStore, unlocked: &[char]) -> Vec<BigramAnomaly>`** — All bigrams where `speed_anomaly_pct() > Some(SPEED_ANOMALY_PCT_THRESHOLD)` and `samples >= ANOMALY_MIN_SAMPLES`, sorted by anomaly_pct desc. Same confirmed logic using `speed_anomaly_streak`.
**Replace `weakest_bigram()`** with **`worst_confirmed_anomaly()`**:
- Takes `char_stats: &KeyStatsStore` and `unlocked: &[char]`
- Collects all confirmed error anomalies and confirmed speed anomalies into a single candidate pool
- Each candidate is `(BigramKey, anomaly_pct, anomaly_type)` where type is `Error` or `Speed`
- **Dedup per bigram**: If a bigram appears in both error and speed lists, keep whichever has higher anomaly_pct (or prefer error on tie)
- Return the single bigram with highest anomaly_pct, or None if no confirmed anomalies
- This eliminates ambiguity about same-bigram-in-both-lists — each bigram gets at most one candidacy
**Update `FocusReasoning` enum** (line 471):
Current variants are: `BigramWins { bigram_difficulty, char_difficulty, char_key }`, `CharWins { char_key, char_difficulty, bigram_best }`, `NoBigrams { char_key }`, `Fallback`.
Replace with:
```rust
pub enum FocusReasoning {
BigramWins {
bigram_anomaly_pct: f64,
anomaly_type: AnomalyType, // Error or Speed
char_key: Option<char>,
},
CharWins {
char_key: char,
bigram_best: Option<(BigramKey, f64)>,
},
NoBigrams {
char_key: char,
},
Fallback,
}
pub enum AnomalyType { Error, Speed }
```
**Update `select_focus_target_with_reasoning()`** (line 489):
- Call `worst_confirmed_anomaly()` instead of `weakest_bigram()`
- **Focus priority rule**: Any confirmed bigram anomaly always wins over char focus. Rationale: char focus is the default skill-tree progression mechanism; confirmed bigram anomalies are exceptional problems that survived a conservative gate (3 consecutive drills above threshold + 20 samples). No cross-scale score comparison needed — confirmation itself is the signal.
- When no confirmed bigram anomalies exist, fall back to char focus as before.
- Anomaly_pct is unbounded (e.g. 200% = 3x worse than expected) — this is fine because confirmation gating prevents transient spikes from stealing focus, and the value is only used for ranking among confirmed anomalies, not for threshold comparison against char scores.
**Update `select_focus_target()`** (line 545):
- Same delegation change, pass `char_stats` through
### 2. `src/app.rs` — Streak update call sites & store cleanup
**`target_cpm` removal checklist** (complete audit of all references):
| Location | What | Action |
|---|---|---|
| `ngram_stats.rs:105-106` | `BigramStatsStore.target_cpm` field + serde attr | Remove field |
| `ngram_stats.rs:288-289` | `TrigramStatsStore.target_cpm` field + serde attr | Remove field |
| `ngram_stats.rs:109-111` | `fn default_target_cpm()` helper | Remove function |
| `ngram_stats.rs:11` | `const DEFAULT_TARGET_CPM` | Remove constant |
| `ngram_stats.rs:115` | `BigramStatsStore::update()` target_time_ms calc | Remove line |
| `ngram_stats.rs:294` | `TrigramStatsStore::update()` target_time_ms calc | Remove line |
| `ngram_stats.rs:1386` | Test helper `bigram_stats.target_cpm = DEFAULT_TARGET_CPM` | Remove line |
| `app.rs:1155` | `self.bigram_stats.target_cpm = ...` in rebuild_ngram_stats | Remove line |
| `app.rs:1157` | `self.ranked_bigram_stats.target_cpm = ...` | Remove line |
| `app.rs:1159` | `self.trigram_stats.target_cpm = ...` | Remove line |
| `app.rs:1161` | `self.ranked_trigram_stats.target_cpm = ...` | Remove line |
| `key_stats.rs:37` | `KeyStatsStore.target_cpm` | **KEEP** — used by `update_key()` for char confidence |
| `app.rs:330,332,609,611,1320,1322,1897-1898,1964-1965` | `key_stats.target_cpm = ...` | **KEEP** — KeyStatsStore still uses target_cpm |
| `config.rs:142` | `fn target_cpm()` | **KEEP** — still used by KeyStatsStore |
**At all 6 `update_redundancy_streak` call sites** (lines 899, 915, 1044, 1195, 1212, plus rebuild):
- Rename to `update_error_anomaly_streak()`
- Add parallel call to `update_speed_anomaly_streak()` passing the appropriate `&KeyStatsStore`:
- `&self.key_stats` for `self.bigram_stats` updates
- `&self.ranked_key_stats` for `self.ranked_bigram_stats` updates
**Update `select_focus_target` calls** in `generate_drill` (line ~663) and drill header in main.rs:
- Add `ranked_key_stats` parameter (already available at call sites)
### 3. `src/ui/components/stats_dashboard.rs` — Two-column anomaly display
**Replace data structs**:
- Remove `EligibleBigramRow` (line 20) and `WatchlistBigramRow` (line 30)
- Add single `AnomalyBigramRow`:
```rust
pub struct AnomalyBigramRow {
pub pair: String,
pub anomaly_pct: f64,
pub sample_count: usize,
pub streak: u8,
pub confirmed: bool,
}
```
**Replace `NgramTabData` fields** (line 39):
- Remove `eligible_bigrams: Vec<EligibleBigramRow>` and `watchlist_bigrams: Vec<WatchlistBigramRow>`
- Add `error_anomalies: Vec<AnomalyBigramRow>` and `speed_anomalies: Vec<AnomalyBigramRow>`
**Replace render functions**:
- Remove `render_eligible_bigrams()` (line 1473) and `render_watchlist_bigrams()` (line 1560)
- Add `render_error_anomalies()` and `render_speed_anomalies()`
- Each renders a table with columns: `Pair | Anomaly% | Samples | Streak`
- Confirmed rows (`.confirmed == true`) use highlight/accent color
- Unconfirmed rows use dimmer/warning color
- Column titles: `" Error Anomalies ({}) "` and `" Speed Anomalies ({}) "`
- Empty states: `" No error anomalies detected"` / `" No speed anomalies detected"`
**Narrow-width adaptation**:
- Wide mode (width >= 60): 50/50 horizontal split, full columns `Pair | Anomaly% | Samples | Streak`
- Narrow mode (width < 60): Stack vertically (error on top, speed below). Compact columns: `Pair | Anom% | Smp`
- Drop `Streak` column
- Abbreviate headers
- This mirrors the existing pattern used by the current eligible/watchlist tables
- **Vertical space budget** (stacked mode): Each panel gets a minimum of 3 data rows (+ 1 header + 1 border = 5 lines). Remaining vertical space is split evenly. If total available height < 10 lines, show only error anomalies panel (speed anomalies are less actionable). This prevents one panel from starving the other.
**Update `render_ngram_tab()`** (line 1308):
- Split the bottom section into two horizontal chunks (50/50)
- Left: `render_error_anomalies()`, Right: `render_speed_anomalies()`
- On narrow terminals (width < 60), stack vertically instead
### 4. `src/main.rs` — Bridge adapter
**`build_ngram_tab_data()`** (~line 2232):
- Call `error_anomaly_bigrams()` and `speed_anomaly_bigrams()` instead of old functions
- Map `BigramAnomaly` → `AnomalyBigramRow`
- Pass `&ranked_key_stats` for speed anomaly computation
**Drill header** (~line 1133): `select_focus_target()` signature change (adding `char_stats` param) will require updating the call here.
---
## Files Modified
1. **`src/engine/ngram_stats.rs`** — Core metrics overhaul (remove confidence from NgramStat, remove target_cpm from stores, add two anomaly systems, new query functions)
2. **`src/app.rs`** — Update streak calls, remove target_cpm initialization, update select_focus_target calls
3. **`src/ui/components/stats_dashboard.rs`** — Two-column anomaly display, new data structs, narrow-width adaptation
4. **`src/main.rs`** — Bridge adapter, select_focus_target call update
---
## Test Updates
- **Rewrite `test_focus_eligible_bigrams_gating`** → `test_error_anomaly_bigrams`: Test that bigrams above error threshold with sufficient samples appear; confirmed flag set correctly based on streak + samples
- **Rewrite `test_watchlist_bigrams_gating`** → split into `test_error_anomaly_confirmation` and `test_speed_anomaly_bigrams`
- **New `test_speed_anomaly_pct`**: Verify speed anomaly calculation against mock char stats; verify None returned when char_b has < MIN_CHAR_SAMPLES_FOR_SPEED (10) samples; verify correct result at exactly 10 samples (boundary)
- **New `test_speed_anomaly_streak_holds_when_char_unavailable`**: Verify streak is not reset when char baseline is insufficient (samples 0, 5, 9 — all below threshold)
- **New `test_speed_anomaly_borderline_baseline`**: Verify behavior at sample count transitions (9 → None, 10 → Some) and that early-session noise at exactly 10 samples produces reasonable anomaly values (not extreme outliers from EMA initialization bias)
- **Update `test_weakest_bigram*`** → `test_worst_confirmed_anomaly*`: Verify it picks highest anomaly across both types, deduplicates per bigram preferring higher pct (error on tie), returns None when nothing confirmed
- **Update focus reasoning tests**: Update `FocusReasoning` variants to new names (`BigramWins` now carries `anomaly_pct` and `anomaly_type` instead of `bigram_difficulty`)
- **Update `build_ngram_tab_data_maps_fields_correctly`**: Check `error_anomalies`/`speed_anomalies` fields with `AnomalyBigramRow` assertions
---
## Verification
1. `cargo build` — no compile errors
2. `cargo test` — all tests pass
3. Manual: N-grams tab shows two columns (Error Anomalies / Speed Anomalies)
4. Manual: Confirmed problem bigrams appear highlighted; unconfirmed appear dimmer
5. Manual: Drill header still shows `Focus: "th"` for bigram focus
6. Manual: Bigrams previously stuck on watchlist due to negative difficulty now appear as confirmed error anomalies
7. Manual: On narrow terminal (< 60 cols), columns stack vertically with compact headers

View File

@@ -0,0 +1,351 @@
# Plan: EMA Error Decay + Integrated Bigram/Char Focus Generation
## Context
Two problems with the current n-gram focus system:
1. **Focus stickiness**: Bigram anomaly uses cumulative `(error_count+1)/(sample_count+2)` Laplace smoothing. A bigram with 20 errors / 25 samples would need ~54 consecutive correct strokes to drop below the 1.5x threshold. Once confirmed, a bigram dominates focus for many drills even as the user visibly improves, while worse bigrams can't take over.
2. **Post-processing bigram focus causes repetition**: When a bigram is in focus, `apply_bigram_focus()` post-processes finished text by replacing 40% of words with dictionary words containing the bigram. This selects randomly from candidates with no duplicate tracking, causing repeated words. It also means the bigram doesn't influence the actual word selection — it's bolted on after generation and overrides the focused char (the weakest char gets replaced by bigram[0]).
This plan addresses both: (A) switch error rate to EMA so anomalies respond to recent performance, and (B) integrate bigram focus directly into the word selection algorithm alongside char focus, enabling both to be active simultaneously.
---
## Part A: EMA Error Rate Decay
### Approach
Add an `error_rate_ema: f64` field to both `NgramStat` and `KeyStat`, updated via exponential moving average on each keystroke (same pattern as existing `filtered_time_ms`). Use this EMA for all anomaly computations instead of cumulative `(error_count+1)/(sample_count+2)`.
Both bigram AND char error rates must use EMA — `error_anomaly_ratio` divides one by the other, so asymmetric decay would distort the comparison.
**Alpha = 0.1** (same as timing EMA). Half-life ~7 samples. A bigram at 30% error rate recovering with all-correct strokes: drops below 1.5x threshold after ~15 correct (~2 drills). This is responsive without being twitchy.
### Changes
#### `src/engine/ngram_stats.rs`
**NgramStat struct** (line 34):
- Add `error_rate_ema: f64` with `#[serde(default = "default_error_rate_ema")]` and default value `0.5`
- Add `fn default_error_rate_ema() -> f64 { 0.5 }` (Laplace-equivalent neutral prior)
- Remove `recent_correct: Vec<bool>` — superseded by EMA and never read
**`update_stat()`** (line 67):
- After existing `error_count` increment, add EMA update:
```rust
let error_signal = if correct { 0.0 } else { 1.0 };
if stat.sample_count == 1 {
stat.error_rate_ema = error_signal;
} else {
stat.error_rate_ema = EMA_ALPHA * error_signal + (1.0 - EMA_ALPHA) * stat.error_rate_ema;
}
```
- Remove `recent_correct` push/trim logic (lines 89-92)
- Keep `error_count` and `sample_count` (needed for gating thresholds and display)
**`smoothed_error_rate_raw()`** (line 95): Remove. After `smoothed_error_rate()` on both BigramStatsStore and TrigramStatsStore switch to `error_rate_ema`, this function has no callers.
**`BigramStatsStore::smoothed_error_rate()`** (line 120): Change to return `stat.error_rate_ema` instead of `smoothed_error_rate_raw(stat.error_count, stat.sample_count)`.
**`TrigramStatsStore::smoothed_error_rate()`** (line 333): Same change — return `stat.error_rate_ema`.
**`error_anomaly_ratio()`** (line 123): No changes needed — it calls `self.smoothed_error_rate()` and `char_stats.smoothed_error_rate()`, which now both return EMA values.
**Default for NgramStat** (line 50): Set `error_rate_ema: 0.5` (neutral — same as Laplace `(0+1)/(0+2)`).
#### `src/engine/key_stats.rs`
**KeyStat struct** (line 7):
- Add `error_rate_ema: f64` with `#[serde(default = "default_error_rate_ema")]` and default value `0.5`
- Add `fn default_error_rate_ema() -> f64 { 0.5 }` helper
- **Note**: KeyStat IS persisted to disk. The `#[serde(default)]` ensures backward compat — existing data without the field gets 0.5.
**`update_key()`** (line 50) — called for correct strokes:
- Add EMA update: `stat.error_rate_ema = if stat.total_count == 1 { 0.0 } else { EMA_ALPHA * 0.0 + (1.0 - EMA_ALPHA) * stat.error_rate_ema }`
- Use `total_count` (already incremented on the line before) to detect first sample
**`update_key_error()`** (line 83) — called for error strokes:
- Add EMA update: `stat.error_rate_ema = if stat.total_count == 1 { 1.0 } else { EMA_ALPHA * 1.0 + (1.0 - EMA_ALPHA) * stat.error_rate_ema }`
**`smoothed_error_rate()`** (line 90): Change to return `stat.error_rate_ema` (or 0.5 for missing keys).
#### `src/app.rs`
**`rebuild_ngram_stats()`** (line 1155):
- Reset `error_rate_ema` to `0.5` alongside `error_count` and `total_count` for KeyStat stores (lines 1165-1172)
- NgramStat stores already reset to `Default` which has `error_rate_ema: 0.5`
- The replay loop (line 1177) naturally rebuilds EMA by calling `update_stat()` and `update_key()`/`update_key_error()` in order
No other app.rs changes needed — the streak update and focus selection code reads through `error_anomaly_ratio()` which now uses EMA values transparently.
---
## Part B: Integrated Bigram + Char Focus Generation
### Approach
Replace the exclusive `FocusTarget` enum (either char OR bigram) with a `FocusSelection` struct that carries both independently. The weakest char comes from skill_tree progression; the worst bigram anomaly comes from the anomaly system. Both feed into the `PhoneticGenerator` simultaneously. Remove `apply_bigram_focus()` post-processing entirely.
### Changes
#### `src/engine/ngram_stats.rs` — Focus selection
**Replace `FocusTarget` enum** (line 510):
```rust
// Old
pub enum FocusTarget { Char(char), Bigram(BigramKey) }
// New
#[derive(Clone, Debug, PartialEq)]
pub struct FocusSelection {
pub char_focus: Option<char>,
pub bigram_focus: Option<(BigramKey, f64, AnomalyType)>,
}
```
**Replace `FocusReasoning` enum** (line 523):
```rust
// Old
pub enum FocusReasoning {
BigramWins { bigram_anomaly_pct: f64, anomaly_type: AnomalyType, char_key: Option<char> },
CharWins { char_key: char, bigram_best: Option<(BigramKey, f64)> },
NoBigrams { char_key: char },
Fallback,
}
// New — reasoning is now just the selection itself (both fields self-describe)
// FocusReasoning is removed; FocusSelection carries all needed info.
```
**Simplify `select_focus_target_with_reasoning()`** → **`select_focus()`**:
```rust
pub fn select_focus(
skill_tree: &SkillTree,
scope: DrillScope,
ranked_key_stats: &KeyStatsStore,
ranked_bigram_stats: &BigramStatsStore,
) -> FocusSelection {
let unlocked = skill_tree.unlocked_keys(scope);
let char_focus = skill_tree.focused_key(scope, ranked_key_stats);
let bigram_focus = ranked_bigram_stats.worst_confirmed_anomaly(ranked_key_stats, &unlocked);
FocusSelection { char_focus, bigram_focus }
}
```
Remove `select_focus_target()` and `select_focus_target_with_reasoning()` — replaced by `select_focus()`.
#### `src/generator/mod.rs` — Trait update
**Update `TextGenerator` trait** (line 14):
```rust
pub trait TextGenerator {
fn generate(
&mut self,
filter: &CharFilter,
focused_char: Option<char>,
focused_bigram: Option<[char; 2]>,
word_count: usize,
) -> String;
}
```
#### `src/generator/phonetic.rs` — Integrated word selection
**`generate()` method** — rewrite word selection with tiered approach:
Note: `find_matching(filter, None)` is used (not `focused_char`) because we do our own tiering below. `find_matching` returns ALL words matching the CharFilter — the `focused` param only sorts, never filters — but passing `None` avoids an unnecessary sort we'd discard anyway.
```rust
fn generate(
&mut self,
filter: &CharFilter,
focused_char: Option<char>,
focused_bigram: Option<[char; 2]>,
word_count: usize,
) -> String {
let matching_words: Vec<String> = self.dictionary
.find_matching(filter, None) // no char-sort; we tier ourselves
.iter().map(|s| s.to_string()).collect();
let use_real_words = matching_words.len() >= MIN_REAL_WORDS;
// Pre-categorize words into tiers for real-word mode
let bigram_str = focused_bigram.map(|b| format!("{}{}", b[0], b[1]));
let focus_char_lower = focused_char.filter(|ch| ch.is_ascii_lowercase());
let (bigram_indices, char_indices, other_indices) = if use_real_words {
let mut bi = Vec::new();
let mut ci = Vec::new();
let mut oi = Vec::new();
for (i, w) in matching_words.iter().enumerate() {
if bigram_str.as_ref().is_some_and(|b| w.contains(b.as_str())) {
bi.push(i);
} else if focus_char_lower.is_some_and(|ch| w.contains(ch)) {
ci.push(i);
} else {
oi.push(i);
}
}
(bi, ci, oi)
} else {
(vec![], vec![], vec![])
};
let mut words: Vec<String> = Vec::new();
let mut recent: Vec<String> = Vec::new(); // anti-repeat window
for _ in 0..word_count {
if use_real_words {
let word = self.pick_tiered_word(
&matching_words,
&bigram_indices,
&char_indices,
&other_indices,
&recent,
);
recent.push(word.clone());
if recent.len() > 4 { recent.remove(0); }
words.push(word);
} else {
let word = self.generate_phonetic_word(
filter, focused_char, focused_bigram,
);
words.push(word);
}
}
words.join(" ")
}
```
**New `pick_tiered_word()` method**:
```rust
fn pick_tiered_word(
&mut self,
all_words: &[String],
bigram_indices: &[usize],
char_indices: &[usize],
other_indices: &[usize],
recent: &[String],
) -> String {
// Tier selection probabilities:
// Both available: 40% bigram, 30% char, 30% other
// Only bigram: 50% bigram, 50% other
// Only char: 70% char, 30% other (matches current behavior)
// Neither: 100% other
//
// Try up to 6 times to avoid repeating a recent word.
for _ in 0..6 {
let tier = self.select_tier(bigram_indices, char_indices, other_indices);
let idx = tier[self.rng.gen_range(0..tier.len())];
let word = &all_words[idx];
if !recent.contains(word) {
return word.clone();
}
}
// Fallback: accept any non-recent word from full pool
let idx = self.rng.gen_range(0..all_words.len());
all_words[idx].clone()
}
```
**`select_tier()` helper**: Returns reference to the tier to sample from based on availability and probability roll. Only considers a tier "available" if it has >= 2 words (prevents unavoidable repeats when a tier has just 1 word and the anti-repeat window rejects it). Falls through to the next tier when the selected tier is too small.
**`try_generate_word()` / `generate_phonetic_word()`** — add bigram awareness for Markov fallback:
- Accept `focused_bigram: Option<[char; 2]>` parameter
- Only attempt bigram forcing when both chars pass the CharFilter (avoids pathological starts when bigram chars are rare/unavailable in current filter scope)
- When eligible: 30% chance to start word with bigram[0] and force bigram[1] as second char, then continue Markov chain from `[' ', bigram[0], bigram[1]]` prefix
- Falls back to existing focused_char logic otherwise
#### `src/generator/code_syntax.rs` + `src/generator/passage.rs`
Add `_focused_bigram: Option<[char; 2]>` parameter to their `generate()` signatures (ignored, matching trait).
#### `src/app.rs` — Pipeline update
**`generate_text()`** (line 653):
- Call `select_focus()` (new function) instead of `select_focus_target()`
- Extract `focused_char` from `selection.char_focus` (the actual weakest char)
- Extract `focused_bigram` from `selection.bigram_focus.map(|(k, _, _)| k.0)`
- Pass both to `generator.generate(filter, focused_char, focused_bigram, word_count)`
- **Remove** the `apply_bigram_focus()` call (lines 784-787)
- Post-processing passes (capitalize, punctuate, numbers, code_patterns) continue to receive `focused_char` — this is now the real weakest char, not the bigram's first char
**Remove `apply_bigram_focus()`** method (lines 1087-1131) entirely.
**Store `FocusSelection`** on App:
- Add `pub current_focus: Option<FocusSelection>` field to App (default `None`)
- Set in `generate_text()` right after `select_focus()` — captures the focus that was actually used to generate the current drill's text
- **Lifecycle**: Set when drill starts (in `generate_text()`). Persists through the drill result screen (so the user sees what was in focus for the drill they just completed). Cleared to `None` when: starting the next drill (overwritten), leaving drill screen, changing drill scope/mode, or on import/reset. This is a snapshot, not live-recomputed — the header always shows what generated the current text.
- Used by drill header display in main.rs (reads `app.current_focus` instead of re-calling `select_focus()`)
#### `src/main.rs` — Drill header + stats adapter
**Drill header** (line 1134):
- Read `app.current_focus` to build focus_text (no re-computation — shows what generated the text)
- Display format: `Focus: 'n' + "th"` (both), `Focus: 'n'` (char only), `Focus: "th"` (bigram only)
- Replace the current `select_focus_target()` call with reading the stored selection
- When `current_focus` is `None`, show no focus text
**`build_ngram_tab_data()`** (line 2253):
- Call `select_focus()` instead of `select_focus_target_with_reasoning()`
- Update `NgramTabData` struct: replace `focus_target: FocusTarget` and `focus_reasoning: FocusReasoning` with `focus: FocusSelection`
#### `src/ui/components/stats_dashboard.rs` — Focus panel
**`NgramTabData`** (line 28):
- Replace `focus_target: FocusTarget` and `focus_reasoning: FocusReasoning` with `focus: FocusSelection`
- Remove `FocusTarget` and `FocusReasoning` imports
**`render_ngram_focus()`** (line 1352):
- Show both focus targets when both active:
- Line 1: `Focus: Char 'n' + Bigram "th"` (or just one if only one active)
- Line 2: Details — `Char 'n': weakest key | Bigram "th": error anomaly 250%`
- When neither active: show fallback message
- Rendering adapts based on which focuses are present
---
## Files Modified
1. **`src/engine/ngram_stats.rs`** — EMA field on NgramStat, EMA-based smoothed_error_rate, `FocusSelection` struct, `select_focus()`, remove old FocusTarget/FocusReasoning
2. **`src/engine/key_stats.rs`** — EMA field on KeyStat, EMA updates in update_key/update_key_error, EMA-based smoothed_error_rate
3. **`src/generator/mod.rs`** — TextGenerator trait: add `focused_bigram` parameter
4. **`src/generator/phonetic.rs`** — Tiered word selection with bigram+char, anti-repeat window, Markov bigram awareness
5. **`src/generator/code_syntax.rs`** — Add ignored `focused_bigram` parameter
6. **`src/generator/passage.rs`** — Add ignored `focused_bigram` parameter
7. **`src/app.rs`** — Use `select_focus()`, pass both focuses to generator, remove `apply_bigram_focus()`, store `current_focus`
8. **`src/main.rs`** — Update drill header, update `build_ngram_tab_data()` adapter
9. **`src/ui/components/stats_dashboard.rs`** — Update NgramTabData, render_ngram_focus for dual focus display
---
## Test Updates
### Part A (EMA)
- **Update `test_error_anomaly_bigrams`**: Set `error_rate_ema` directly instead of relying on cumulative error_count/sample_count for anomaly ratio computation
- **Update `test_worst_confirmed_anomaly_dedup`** and **`_prefers_error_on_tie`**: Same — set EMA values
- **New `test_error_rate_ema_decay`**: Verify that after N correct strokes, error_rate_ema drops as expected. Verify anomaly ratio crosses below threshold after reasonable recovery (~15 correct strokes from 30% error rate).
- **New `test_error_rate_ema_rebuild_from_history`**: Verify that rebuilding from drill history produces same EMA as live updates (deterministic replay)
- **New `test_ema_ranking_stability_during_recovery`**: Two bigrams both confirmed. Bigram A has higher anomaly. User corrects bigram A over several drills while bigram B stays bad. Verify that A's anomaly drops below B's and B becomes the new worst_confirmed_anomaly — clean handoff without oscillation.
- **Update key_stats tests**: Verify EMA updates in `update_key()` and `update_key_error()`, backward compat (serde default)
### Part B (Integrated focus)
- **Replace focus reasoning tests** (`test_select_focus_with_reasoning_*`): Replace with `test_select_focus_*` testing `FocusSelection` struct — verify both char_focus and bigram_focus are populated independently
- **New `test_phonetic_bigram_focus_increases_bigram_words`**: Generate 1200 words with focused_bigram, verify significantly more words contain the bigram than without
- **New `test_phonetic_dual_focus_no_excessive_repeats`**: Generate text with both focuses, verify no word appears > 3 times consecutively
- **Update `build_ngram_tab_data_maps_fields_correctly`**: Update for `FocusSelection` struct instead of FocusTarget/FocusReasoning
- **New `test_find_matching_focused_is_sort_only`** (in `dictionary.rs` or `phonetic.rs`): Verify that `find_matching(filter, Some('k'))` and `find_matching(filter, None)` return the same set of words (same membership, potentially different order). Guards against future regressions where focused param accidentally becomes a filter.
- No `apply_bigram_focus` tests exist to remove (method was untested)
---
## Verification
1. `cargo build` — no compile errors
2. `cargo test` — all tests pass
3. Manual: Start adaptive drill, observe both char and bigram appearing in focus header
4. Manual: Verify drill text contains focused bigram words AND focused char words mixed naturally
5. Manual: Verify no excessive word repetition (the old apply_bigram_focus problem)
6. Manual: Practice a bigram focus target correctly for 2-3 drills → verify it drops out of focus and a different bigram (or char-only) takes over
7. Manual: N-grams tab shows both focuses in the Active Focus panel
8. Manual: Narrow terminal (<60 cols) stacks anomaly panels vertically; very short terminal (<10 rows available for panels) shows only error anomalies panel; focus panel always shows at least line 1