Files

Tyler Hallada 54ddebf054 N-gram metrics overhaul & UI improvements

2026-02-26 01:26:25 -05:00

16 KiB

Raw Blame History

Plan: Bigram Metrics Overhaul — Error Anomaly & Speed Anomaly

Context

The current bigram metrics use difficulty = (1 - confidence) * redundancy to gate eligibility and focus. This is fundamentally broken: when a user types faster than target WPM (confidence > 1.0), difficulty goes negative — even for bigrams with 100% error rate. The root cause is that "confidence" (a speed-vs-target ratio) and "redundancy" (an error-rate ratio) are conflated into a single metric that can cancel out genuine problems.

This overhaul replaces the conflated system with two orthogonal anomaly metrics:

error_anomaly — how much worse a bigram's error rate is compared to what's expected from its constituent characters (same math as current redundancy_score, reframed as a percentage)
speed_anomaly — how much slower a bigram transition is compared to the user's normal speed typing the second character (user-relative, no target WPM dependency)

Both are displayed as percentages where positive = worse than expected. The UI shows two side-by-side columns, one per anomaly type, with confirmed problems highlighted.

Persistence / Migration

NgramStat is NOT persisted to disk. N-gram stores are rebuilt from drill history on every startup (see json_store.rs:104 comment: "N-gram stats are not included — they are always rebuilt from drill history", and app.rs:1152 rebuild_ngram_stats()). The stores are never saved via save_data() — only profile, key_stats, ranked_key_stats, and drill_history are persisted.

Therefore:

No serde migration, #[serde(alias)], or backward-compat handling is needed for NgramStat field renames/removals
#[serde(default)] annotations on NgramStat fields are vestigial (the derive exists for in-memory cloning, not disk persistence) but harmless to leave
The Serialize/Deserialize derives on NgramStat can stay (used by BigramStatsStore/TrigramStatsStore types which derive them transitively, though the stores themselves are also not persisted)

KeyStat IS persisted — confidence on KeyStat is NOT being changed (used by skill_tree progression). No migration needed there.

Changes

1. `src/engine/ngram_stats.rs` — Metrics engine overhaul

NgramStat struct (line 34):

Remove confidence: f64 field
Rename redundancy_streak: u8 → error_anomaly_streak: u8
Add speed_anomaly_streak: u8 with #[serde(default)]
Preserved fields (explicitly unchanged): filtered_time_ms, best_time_ms, sample_count, error_count, hesitation_count, recent_times, recent_correct, last_seen_drill_index — all remain and continue to be updated by update_stat()

update_stat() (line 65):

Remove confidence = target_time_ms / stat.filtered_time_ms computation (line 82)
Remove target_time_ms parameter (no longer needed)
Keep hesitation parameter and drill_index parameter — these update hesitation_count (line 72) and last_seen_drill_index (line 66) which are used by trigram pruning and other downstream logic
New signature (module-private, matching current visibility): fn update_stat(stat: &mut NgramStat, time_ms: f64, correct: bool, hesitation: bool, drill_index: u32)
All other field updates remain identical (EMA on filtered_time_ms, best_time_ms, recent_times, recent_correct, error_count, sample_count)

Constants (lines 10-16):

Rename STABILITY_THRESHOLD → ERROR_ANOMALY_RATIO_THRESHOLD (value stays 1.5)
Rename STABILITY_STREAK_REQUIRED → ANOMALY_STREAK_REQUIRED (value stays 3)
Rename WATCHLIST_MIN_SAMPLES → ANOMALY_MIN_SAMPLES (value stays 3)
Add SPEED_ANOMALY_PCT_THRESHOLD: f64 = 50.0 (50% slower than expected)
Add MIN_CHAR_SAMPLES_FOR_SPEED: usize = 10 (EMA alpha=0.1 needs ~10 samples for initial value to decay to ~35% influence; 5 samples still has ~59% initial-value bias, too noisy for baseline)
Remove DEFAULT_TARGET_CPM (no longer used by update_stat or stores)

BigramStatsStore struct (line 102):

Remove target_cpm: f64 field and default_target_cpm() helper
BigramStatsStore::update() (line 114): Remove target_time_ms calculation. Pass-through to update_stat() without it.

TrigramStatsStore struct (line 285):

Remove target_cpm: f64 field
TrigramStatsStore::update() (line 293): Remove target_time_ms calculation. Pass-through to update_stat() without it.

Remove get_confidence() methods on both stores (lines 121, 300) — they read the deleted confidence field. Both are #[allow(dead_code)] already.

Rename redundancy_score() → error_anomaly_ratio() (line 132):

Same math internally, just renamed. Returns e_ab / expected_ab.

New methods on BigramStatsStore:

/// Error anomaly as percentage: (ratio - 1.0) * 100
/// Returns None if bigram has no stats.
pub fn error_anomaly_pct(&self, key: &BigramKey, char_stats: &KeyStatsStore) -> Option<f64> {
    let _stat = self.stats.get(key)?;
    let ratio = self.error_anomaly_ratio(key, char_stats);
    Some((ratio - 1.0) * 100.0)
}

/// Speed anomaly: % slower than user types char_b in isolation.
/// Compares bigram filtered_time_ms to char_b's filtered_time_ms.
/// Returns None if bigram has no stats or char_b has < MIN_CHAR_SAMPLES_FOR_SPEED samples.
pub fn speed_anomaly_pct(&self, key: &BigramKey, char_stats: &KeyStatsStore) -> Option<f64> {
    let stat = self.stats.get(key)?;
    let char_b_stat = char_stats.stats.get(&key.0[1])?;
    if char_b_stat.sample_count < MIN_CHAR_SAMPLES_FOR_SPEED { return None; }
    let ratio = stat.filtered_time_ms / char_b_stat.filtered_time_ms;
    Some((ratio - 1.0) * 100.0)
}

Rename update_redundancy_streak() → update_error_anomaly_streak() (line 142):

Same logic, uses renamed constant and renamed field

New update_speed_anomaly_streak():

Same pattern as error streak: call speed_anomaly_pct(), compare against SPEED_ANOMALY_PCT_THRESHOLD
If speed_anomaly_pct() returns None (char baseline unavailable/under-sampled), hold previous streak value — don't reset or increment. The bigram simply can't be evaluated for speed yet.
Requires both bigram samples >= ANOMALY_MIN_SAMPLES AND char_b samples >= MIN_CHAR_SAMPLES_FOR_SPEED before any streak update occurs.

New BigramAnomaly struct:

pub struct BigramAnomaly {
    pub key: BigramKey,
    pub anomaly_pct: f64,
    pub sample_count: usize,
    pub streak: u8,
    pub confirmed: bool,  // streak >= ANOMALY_STREAK_REQUIRED && samples >= MIN_SAMPLES_FOR_FOCUS
}

Replace focus_eligible_bigrams() + watchlist_bigrams() with:

error_anomaly_bigrams(&self, char_stats: &KeyStatsStore, unlocked: &[char]) -> Vec<BigramAnomaly> — All bigrams with error_anomaly_ratio > ERROR_ANOMALY_RATIO_THRESHOLD and samples >= ANOMALY_MIN_SAMPLES, sorted by anomaly_pct desc. Each entry's confirmed flag = error_anomaly_streak >= ANOMALY_STREAK_REQUIRED && samples >= MIN_SAMPLES_FOR_FOCUS.
speed_anomaly_bigrams(&self, char_stats: &KeyStatsStore, unlocked: &[char]) -> Vec<BigramAnomaly> — All bigrams where speed_anomaly_pct() > Some(SPEED_ANOMALY_PCT_THRESHOLD) and samples >= ANOMALY_MIN_SAMPLES, sorted by anomaly_pct desc. Same confirmed logic using speed_anomaly_streak.

Replace weakest_bigram() with worst_confirmed_anomaly():

Takes char_stats: &KeyStatsStore and unlocked: &[char]
Collects all confirmed error anomalies and confirmed speed anomalies into a single candidate pool
Each candidate is (BigramKey, anomaly_pct, anomaly_type) where type is Error or Speed
Dedup per bigram: If a bigram appears in both error and speed lists, keep whichever has higher anomaly_pct (or prefer error on tie)
Return the single bigram with highest anomaly_pct, or None if no confirmed anomalies
This eliminates ambiguity about same-bigram-in-both-lists — each bigram gets at most one candidacy

Update FocusReasoning enum (line 471): Current variants are: BigramWins { bigram_difficulty, char_difficulty, char_key }, CharWins { char_key, char_difficulty, bigram_best }, NoBigrams { char_key }, Fallback.

Replace with:

pub enum FocusReasoning {
    BigramWins {
        bigram_anomaly_pct: f64,
        anomaly_type: AnomalyType,  // Error or Speed
        char_key: Option<char>,
    },
    CharWins {
        char_key: char,
        bigram_best: Option<(BigramKey, f64)>,
    },
    NoBigrams {
        char_key: char,
    },
    Fallback,
}

pub enum AnomalyType { Error, Speed }

Update select_focus_target_with_reasoning() (line 489):

Call worst_confirmed_anomaly() instead of weakest_bigram()
Focus priority rule: Any confirmed bigram anomaly always wins over char focus. Rationale: char focus is the default skill-tree progression mechanism; confirmed bigram anomalies are exceptional problems that survived a conservative gate (3 consecutive drills above threshold + 20 samples). No cross-scale score comparison needed — confirmation itself is the signal.
When no confirmed bigram anomalies exist, fall back to char focus as before.
Anomaly_pct is unbounded (e.g. 200% = 3x worse than expected) — this is fine because confirmation gating prevents transient spikes from stealing focus, and the value is only used for ranking among confirmed anomalies, not for threshold comparison against char scores.

Update select_focus_target() (line 545):

Same delegation change, pass char_stats through

2. `src/app.rs` — Streak update call sites & store cleanup

target_cpm removal checklist (complete audit of all references):

Location	What	Action
`ngram_stats.rs:105-106`	`BigramStatsStore.target_cpm` field + serde attr	Remove field
`ngram_stats.rs:288-289`	`TrigramStatsStore.target_cpm` field + serde attr	Remove field
`ngram_stats.rs:109-111`	`fn default_target_cpm()` helper	Remove function
`ngram_stats.rs:11`	`const DEFAULT_TARGET_CPM`	Remove constant
`ngram_stats.rs:115`	`BigramStatsStore::update()` target_time_ms calc	Remove line
`ngram_stats.rs:294`	`TrigramStatsStore::update()` target_time_ms calc	Remove line
`ngram_stats.rs:1386`	Test helper `bigram_stats.target_cpm = DEFAULT_TARGET_CPM`	Remove line
`app.rs:1155`	`self.bigram_stats.target_cpm = ...` in rebuild_ngram_stats	Remove line
`app.rs:1157`	`self.ranked_bigram_stats.target_cpm = ...`	Remove line
`app.rs:1159`	`self.trigram_stats.target_cpm = ...`	Remove line
`app.rs:1161`	`self.ranked_trigram_stats.target_cpm = ...`	Remove line
`key_stats.rs:37`	`KeyStatsStore.target_cpm`	KEEP — used by `update_key()` for char confidence
`app.rs:330,332,609,611,1320,1322,1897-1898,1964-1965`	`key_stats.target_cpm = ...`	KEEP — KeyStatsStore still uses target_cpm
`config.rs:142`	`fn target_cpm()`	KEEP — still used by KeyStatsStore

At all 6 update_redundancy_streak call sites (lines 899, 915, 1044, 1195, 1212, plus rebuild):

Rename to update_error_anomaly_streak()
Add parallel call to update_speed_anomaly_streak() passing the appropriate &KeyStatsStore:
- &self.key_stats for self.bigram_stats updates
- &self.ranked_key_stats for self.ranked_bigram_stats updates

Update select_focus_target calls in generate_drill (line ~663) and drill header in main.rs:

Add ranked_key_stats parameter (already available at call sites)

3. `src/ui/components/stats_dashboard.rs` — Two-column anomaly display

Replace data structs:

Remove EligibleBigramRow (line 20) and WatchlistBigramRow (line 30)

Add single AnomalyBigramRow:

pub struct AnomalyBigramRow {
    pub pair: String,
    pub anomaly_pct: f64,
    pub sample_count: usize,
    pub streak: u8,
    pub confirmed: bool,
}

Replace NgramTabData fields (line 39):

Remove eligible_bigrams: Vec<EligibleBigramRow> and watchlist_bigrams: Vec<WatchlistBigramRow>
Add error_anomalies: Vec<AnomalyBigramRow> and speed_anomalies: Vec<AnomalyBigramRow>

Replace render functions:

Remove render_eligible_bigrams() (line 1473) and render_watchlist_bigrams() (line 1560)
Add render_error_anomalies() and render_speed_anomalies()
Each renders a table with columns: Pair | Anomaly% | Samples | Streak
Confirmed rows (.confirmed == true) use highlight/accent color
Unconfirmed rows use dimmer/warning color
Column titles: " Error Anomalies ({}) " and " Speed Anomalies ({}) "
Empty states: " No error anomalies detected" / " No speed anomalies detected"

Narrow-width adaptation:

Wide mode (width >= 60): 50/50 horizontal split, full columns Pair | Anomaly% | Samples | Streak
Narrow mode (width < 60): Stack vertically (error on top, speed below). Compact columns: Pair | Anom% | Smp
- Drop Streak column
- Abbreviate headers
- This mirrors the existing pattern used by the current eligible/watchlist tables
Vertical space budget (stacked mode): Each panel gets a minimum of 3 data rows (+ 1 header + 1 border = 5 lines). Remaining vertical space is split evenly. If total available height < 10 lines, show only error anomalies panel (speed anomalies are less actionable). This prevents one panel from starving the other.

Update render_ngram_tab() (line 1308):

Split the bottom section into two horizontal chunks (50/50)
Left: render_error_anomalies(), Right: render_speed_anomalies()
On narrow terminals (width < 60), stack vertically instead

4. `src/main.rs` — Bridge adapter

build_ngram_tab_data() (~line 2232):

Call error_anomaly_bigrams() and speed_anomaly_bigrams() instead of old functions
Map BigramAnomaly → AnomalyBigramRow
Pass &ranked_key_stats for speed anomaly computation

Drill header (~line 1133): select_focus_target() signature change (adding char_stats param) will require updating the call here.

Files Modified

src/engine/ngram_stats.rs — Core metrics overhaul (remove confidence from NgramStat, remove target_cpm from stores, add two anomaly systems, new query functions)
src/app.rs — Update streak calls, remove target_cpm initialization, update select_focus_target calls
src/ui/components/stats_dashboard.rs — Two-column anomaly display, new data structs, narrow-width adaptation
src/main.rs — Bridge adapter, select_focus_target call update

Test Updates

Rewrite test_focus_eligible_bigrams_gating → test_error_anomaly_bigrams: Test that bigrams above error threshold with sufficient samples appear; confirmed flag set correctly based on streak + samples
Rewrite test_watchlist_bigrams_gating → split into test_error_anomaly_confirmation and test_speed_anomaly_bigrams
New test_speed_anomaly_pct: Verify speed anomaly calculation against mock char stats; verify None returned when char_b has < MIN_CHAR_SAMPLES_FOR_SPEED (10) samples; verify correct result at exactly 10 samples (boundary)
New test_speed_anomaly_streak_holds_when_char_unavailable: Verify streak is not reset when char baseline is insufficient (samples 0, 5, 9 — all below threshold)
New test_speed_anomaly_borderline_baseline: Verify behavior at sample count transitions (9 → None, 10 → Some) and that early-session noise at exactly 10 samples produces reasonable anomaly values (not extreme outliers from EMA initialization bias)
Update test_weakest_bigram* → test_worst_confirmed_anomaly*: Verify it picks highest anomaly across both types, deduplicates per bigram preferring higher pct (error on tie), returns None when nothing confirmed
Update focus reasoning tests: Update FocusReasoning variants to new names (BigramWins now carries anomaly_pct and anomaly_type instead of bigram_difficulty)
Update build_ngram_tab_data_maps_fields_correctly: Check error_anomalies/speed_anomalies fields with AnomalyBigramRow assertions

Verification

cargo build — no compile errors
cargo test — all tests pass
Manual: N-grams tab shows two columns (Error Anomalies / Speed Anomalies)
Manual: Confirmed problem bigrams appear highlighted; unconfirmed appear dimmer
Manual: Drill header still shows Focus: "th" for bigram focus
Manual: Bigrams previously stuck on watchlist due to negative difficulty now appear as confirmed error anomalies
Manual: On narrow terminal (< 60 cols), columns stack vertically with compact headers

16 KiB Raw Blame History