Code drill feature parity, downloading snippets from github

Phase 1 and 2. Phase 3 will allow custom github repo input.
This commit is contained in:
2026-02-18 05:12:01 +00:00
parent 2d63cffb33
commit d0605f8426
11 changed files with 4520 additions and 372 deletions

View File

@@ -0,0 +1,757 @@
# Code Drill Feature Parity Plan
## Context
The code drill feature is significantly less developed than the passage drill. The passage drill has a full onboarding flow, lazy downloads with progress bars, configurable network/cache settings, and rich content from Project Gutenberg. The code drill only has 4 hardcoded languages with ~20-30 built-in snippets each, a basic language selection screen, and a partially-implemented synchronous GitHub fetch that blocks the UI thread. There's also a completely dead `github_code.rs` file that's never used.
This plan is split into three delivery phases:
1. **Phase 1**: Feature parity with passage drill (onboarding, downloads, progress bar, config)
2. **Phase 2**: Language expansion and extraction improvements
3. **Phase 3**: Custom repo support
## Current Code Drill Analysis
### What exists:
- **`generator/code_syntax.rs`**: `CodeSyntaxGenerator` with built-in snippets for 4 languages (rust, python, javascript, go), a `try_fetch_code()` that synchronously fetches from hardcoded GitHub URLs (blocking UI), `extract_code_snippets()` for parsing functions from source
- **`generator/code_patterns.rs`**: Post-processor that inserts code-like expressions into adaptive drill text (unrelated to code drill mode)
- **`generator/github_code.rs`**: **Dead code** - `GitHubCodeGenerator` struct with `#[allow(dead_code)]`, never referenced outside its own file
- **Config**: Only `code_language: String` - no download/network/onboarding settings
- **Screens**: `CodeLanguageSelect` only - no intro, no download progress
- **Languages**: rust, python, javascript, go, "all"
### What passage drill has that code drill doesn't:
- Onboarding intro screen (`PassageIntro`) with config for downloads/dir/limits
- `passage_onboarding_done` flag (shows intro only on first use)
- `passage_downloads_enabled` toggle
- `passage_download_dir` configurable path
- `passage_paragraphs_per_book` content limit
- Lazy download: on drill start, downloads one book if not cached
- Background download thread with atomic progress reporting
- Download progress screen (`PassageDownloadProgress`) with byte-level progress bar
- Fallback to built-in content when downloads off
### Built-in snippet whitespace review:
- **Rust**: 4-space indent - idiomatic
- **Python**: 4-space indent - idiomatic
- **JavaScript**: 4-space indent - idiomatic
- **Go**: `\t` tab indent - idiomatic
All whitespace is correct. The escaped string format (`\n`, `\t`, `\"`) is hard to read. Converting to raw strings (`r#"..."#`) improves maintainability.
---
## Phase 1: Feature Parity with Passage Drill
Goal: Give code drill the same onboarding, download, caching, and config infrastructure as passage drill. Keep the existing 4 languages. No language expansion yet.
### Step 1.1: Delete dead code
- Delete `src/generator/github_code.rs` entirely
- Remove `pub mod github_code;` from `src/generator/mod.rs`
### Step 1.2: Convert built-in snippets to raw strings
**File**: `src/generator/code_syntax.rs`
Convert all 4 language snippet arrays from escaped strings to `r#"..."#` raw strings. Example:
Before: `"fn main() {\n println!(\"hello\");\n}"`
After:
```rust
r#"fn main() {
println!("hello");
}"#
```
Go snippets: `\t` becomes actual tab characters inside raw strings (correct for Go).
Keep all existing snippets at their current count (~20-30 per language). Do NOT reduce them -- since downloads default to off, these are the primary content source for new users.
Validation: run `cargo test` after conversion. Add a focused test that asserts a sample snippet's char content matches expectations (catches any accidental whitespace changes).
### Step 1.3: Add config fields for code drill
**File**: `src/config.rs`
Add fields mirroring passage drill config:
```rust
#[serde(default = "default_code_downloads_enabled")]
pub code_downloads_enabled: bool, // default: false
#[serde(default = "default_code_download_dir")]
pub code_download_dir: String, // default: dirs::data_dir()/keydr/code/
#[serde(default = "default_code_snippets_per_repo")]
pub code_snippets_per_repo: usize, // default: 50
#[serde(default = "default_code_onboarding_done")]
pub code_onboarding_done: bool, // default: false
```
`code_download_dir` default uses `dirs::data_dir()` (same pattern as `default_passage_download_dir`) for cross-platform portability.
`code_snippets_per_repo` is a **download-time extraction cap**: when fetching from a repo, extract at most this many snippets and write them to cache. The generator reads whatever is in the cache without re-filtering.
Update `Default` impl. Add `default_*` functions.
**Config normalization**: After deserialization in `App::new()` (not `Config::load()`, to avoid coupling config to generator internals), validate `code_language` against `code_language_options()`. If invalid (e.g., old/renamed key), reset to `"rust"`.
**Old cache migration**: The old `DiskCache("code_cache")` entries (in `~/.local/share/keydr/code_cache/`) are simply ignored. They used a different key format (`{lang}_snippets`) and location. No migration or cleanup needed -- they'll be naturally superseded by the new cache in `code_download_dir`.
### Step 1.4: Define language data structures
**File**: `src/generator/code_syntax.rs`
Add structures for the language registry. Phase 1 only populates the 4 existing languages + "all":
```rust
pub struct CodeLanguage {
pub key: &'static str, // filesystem-safe identifier (e.g. "rust", "bash")
pub display_name: &'static str, // UI label (e.g. "Rust", "Shell/Bash")
pub extensions: &'static [&'static str], // e.g. &[".rs"], &[".py", ".pyi"]
pub repos: &'static [CodeRepo],
pub has_builtin: bool,
}
pub struct CodeRepo {
pub key: &'static str, // filesystem-safe identifier for cache naming
pub urls: &'static [&'static str], // raw.githubusercontent.com file URLs to fetch
}
pub const CODE_LANGUAGES: &[CodeLanguage] = &[
CodeLanguage {
key: "rust",
display_name: "Rust",
extensions: &[".rs"],
repos: &[
CodeRepo {
key: "tokio",
urls: &[
"https://raw.githubusercontent.com/tokio-rs/tokio/master/tokio/src/sync/mutex.rs",
"https://raw.githubusercontent.com/tokio-rs/tokio/master/tokio/src/net/tcp/stream.rs",
],
},
CodeRepo {
key: "serde",
urls: &[
"https://raw.githubusercontent.com/serde-rs/serde/master/serde/src/ser/mod.rs",
],
},
],
has_builtin: true,
},
// ... python, javascript, go with similar structure
// Move existing hardcoded URLs from try_fetch_code() into these repo definitions
];
```
Helper functions:
```rust
pub fn code_language_options() -> Vec<(&'static str, String)>
// Returns [("rust", "Rust"), ("python", "Python"), ..., ("all", "All (random)")]
pub fn language_by_key(key: &str) -> Option<&'static CodeLanguage>
pub fn is_language_cached(cache_dir: &str, key: &str) -> bool
// Checks if any {key}_*.txt files exist in cache_dir AND have non-empty content (>0 bytes)
// Uses direct filesystem scanning (NOT DiskCache -- DiskCache has no list/glob API)
```
### Step 1.5: Generalize download job struct
**File**: `src/app.rs`
Rename `PassageDownloadJob` to `DownloadJob`. It's already generic (just `Arc<AtomicU64>`, `Arc<AtomicBool>`, and a thread handle). Update all passage references to use the renamed type. No behavior change.
### Step 1.6: Add code drill app state
**File**: `src/app.rs`
Add `CodeDownloadCompleteAction` enum (parallels `PassageDownloadCompleteAction`):
```rust
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum CodeDownloadCompleteAction {
StartCodeDrill,
ReturnToSettings,
}
```
Add screen variants:
```rust
CodeIntro, // Onboarding screen for code drill
CodeDownloadProgress, // Download progress for code files
```
Add app fields:
```rust
pub code_intro_selected: usize,
pub code_intro_downloads_enabled: bool,
pub code_intro_download_dir: String,
pub code_intro_snippets_per_repo: usize,
pub code_intro_downloading: bool,
pub code_intro_download_total: usize,
pub code_intro_downloaded: usize,
pub code_intro_current_repo: String,
pub code_intro_download_bytes: u64,
pub code_intro_download_bytes_total: u64,
pub code_download_queue: Vec<usize>, // repo indices within current language's repos array
pub code_drill_language_override: Option<String>,
pub code_download_action: CodeDownloadCompleteAction,
code_download_job: Option<DownloadJob>,
```
### Step 1.7: Remove blocking fetch from generator
**File**: `src/generator/code_syntax.rs`
Remove `try_fetch_code()` from `CodeSyntaxGenerator`. All network I/O moves to the app layer with background threads.
Update constructor:
```rust
pub fn new(rng: SmallRng, language: &str, cache_dir: &str) -> Self
```
Update `load_cached_snippets()`: scan `cache_dir` for files matching `{language}_*.txt`, read each, split on `---SNIPPET---` delimiter. This replaces the `DiskCache("code_cache")` approach with direct filesystem reads (since `DiskCache` has no listing/glob API and the cache dir is now user-configurable).
### Step 1.8: Add download function
**File**: `src/generator/code_syntax.rs`
```rust
pub fn download_code_repo_to_cache_with_progress<F>(
cache_dir: &str,
language_key: &str,
repo: &CodeRepo,
snippets_limit: usize,
on_progress: F,
) -> bool
where
F: FnMut(u64, Option<u64>),
```
This function:
1. Creates `cache_dir` if needed (`fs::create_dir_all`)
2. Fetches each URL in `repo.urls` using `fetch_url_bytes_with_progress` (already exists in `cache.rs`)
3. Runs `extract_code_snippets()` on each fetched file
4. Combines all snippets, truncates to `snippets_limit`
5. Writes to `{cache_dir}/{language_key}_{repo.key}.txt` with `---SNIPPET---` delimiter
6. Returns `true` on success
**Error handling**: If any individual URL fails (404, timeout, network error), skip it and continue with others. If zero snippets extracted from all URLs, return `false`. The app layer treats `false` as "skip this repo, continue queue" (same as passage drill's failure behavior).
### Step 1.9: Implement code drill flow methods
**File**: `src/app.rs`
**`go_to_code_intro()`**: Initialize intro screen state (downloads toggle, dir, snippets limit from config). Set `code_download_action = CodeDownloadCompleteAction::StartCodeDrill`. Set screen to `CodeIntro`.
**`start_code_drill()`**: Lazy download logic with explicit language resolution:
```rust
pub fn start_code_drill(&mut self) {
// Step 1: Resolve concrete language (never download with "all" selected)
if self.code_drill_language_override.is_none() {
let chosen = if self.config.code_language == "all" {
// Pick from languages with built-in OR cached content only
// Never pick a network-only language that isn't cached
let available = languages_with_content(&self.config.code_download_dir);
if available.is_empty() {
"rust".to_string() // ultimate fallback
} else {
let idx = self.rng.gen_range(0..available.len());
available[idx].to_string()
}
} else {
self.config.code_language.clone()
};
self.code_drill_language_override = Some(chosen);
}
let chosen = self.code_drill_language_override.clone().unwrap();
// Step 2: Check if we need to download
if self.config.code_downloads_enabled
&& !is_language_cached(&self.config.code_download_dir, &chosen)
{
if let Some(lang) = language_by_key(&chosen) {
if !lang.repos.is_empty() {
// Pick one random repo to download
let repo_idx = self.rng.gen_range(0..lang.repos.len());
self.code_download_queue = vec![repo_idx];
self.code_intro_download_total = 1;
self.code_intro_downloaded = 0;
self.code_intro_downloading = true;
self.code_intro_current_repo = format!("{}", lang.repos[repo_idx].key);
self.code_download_action = CodeDownloadCompleteAction::StartCodeDrill;
self.code_download_job = None;
self.screen = AppScreen::CodeDownloadProgress;
return;
}
}
// Language has no repos or unknown: fall through to built-in
}
// Step 3: If language has no built-in AND no cache AND downloads off → fallback
if !is_language_cached(&self.config.code_download_dir, &chosen) {
if let Some(lang) = language_by_key(&chosen) {
if !lang.has_builtin {
// Network-only language with no cache: fall back to "rust"
self.code_drill_language_override = Some("rust".to_string());
}
}
}
// Step 4: Start the drill
self.drill_mode = DrillMode::Code;
self.drill_scope = DrillScope::Global;
self.start_drill();
}
```
Key behavior: `"all"` only selects from `languages_with_content()` (built-in OR cached). This prevents the dead-end loop of repeatedly picking uncached network-only languages and forcing download screens. In Phase 2, once network-only languages get cached via manual download, they are automatically included in `"all"` selection.
**`languages_with_content(cache_dir: &str) -> Vec<&'static str>`**: Returns language keys that have either `has_builtin: true` or non-empty cache files in `cache_dir`.
**`process_code_download_tick()`**, **`spawn_code_download_job()`**: Same pattern as passage equivalents, using `download_code_repo_to_cache_with_progress` and `DownloadJob`.
**`start_code_downloads_from_settings()`**: Mirror `start_passage_downloads_from_settings()` with `CodeDownloadCompleteAction::ReturnToSettings`.
### Step 1.10: Update code language select flow
**File**: `src/main.rs`
Update `handle_code_language_key()` and `render_code_language_select()`:
- Still shows the same 4+1 languages for now (Phase 2 expands this)
- Wire Enter to `confirm_code_language_and_continue()`:
```rust
fn confirm_code_language_and_continue(app: &mut App, langs: &[&str]) {
if app.code_language_selected >= langs.len() { return; }
app.config.code_language = langs[app.code_language_selected].to_string();
let _ = app.config.save();
if app.config.code_onboarding_done {
app.start_code_drill();
} else {
app.go_to_code_intro();
}
}
```
### Step 1.11: Add event handlers and renderers
**File**: `src/main.rs`
Add to screen dispatch in `handle_key()` and `render()`:
**`handle_code_intro_key()`**: Same field navigation as `handle_passage_intro_key()` but operates on `code_intro_*` fields. 4 fields:
1. Enable network downloads (toggle)
2. Download directory (editable text)
3. Snippets per repo (numeric, adjustable)
4. Start code drill (confirm button)
On confirm: save config fields, set `code_onboarding_done = true`, call `start_code_drill()`.
**`handle_code_download_progress_key()`**: Esc/q to cancel. On cancel:
1. Clear `code_download_queue`
2. Set `code_intro_downloading = false`
3. If a `code_download_job` is in-flight, detach it (set to `None` without joining -- the thread will finish and write to cache, which is harmless; the `Arc` atomics keep the thread safe)
4. Reset `code_drill_language_override` to `None`
5. Go to menu
This matches the existing passage download cancel behavior (passage also does not join/abort in-flight threads on Esc).
**`render_code_intro()`**: Mirror `render_passage_intro()` layout. Title: "Code Downloads Setup". Explanatory text: "Configure code source settings before your first code drill." / "Downloads are lazy: code is fetched only when first needed."
**`render_code_download_progress()`**: Mirror `render_passage_download_progress()`. Title: "Downloading Code Source". Show repo name, byte progress bar.
Update tick handler:
```rust
if (app.screen == AppScreen::CodeIntro
|| app.screen == AppScreen::CodeDownloadProgress)
&& app.code_intro_downloading
{
app.process_code_download_tick();
}
```
### Step 1.12: Update generate_text for Code mode
**File**: `src/app.rs`
Update `DrillMode::Code` in `generate_text()`:
```rust
DrillMode::Code => {
let filter = CharFilter::new(('a'..='z').collect());
let lang = self.code_drill_language_override
.clone()
.unwrap_or_else(|| self.config.code_language.clone());
let rng = SmallRng::from_rng(&mut self.rng).unwrap();
let mut generator = CodeSyntaxGenerator::new(
rng, &lang, &self.config.code_download_dir,
);
self.code_drill_language_override = None;
let text = generator.generate(&filter, None, word_count);
(text, Some(generator.last_source().to_string()))
}
```
### Step 1.13: Settings integration
**Files**: `src/main.rs`, `src/app.rs`
Add settings rows after existing code language field (index 3):
- Index 4: Code Downloads: On/Off
- Index 5: Code Download Dir: editable path
- Index 6: Code Snippets per Repo: numeric
- Index 7: Download Code Now: action button
Shift existing passage settings indices up by 4. Update `settings_cycle_forward`/`settings_cycle_backward` and max `settings_selected` bound.
**"Download Code Now" behavior**: Downloads all uncached curated repos for the currently selected `code_language` only. If `code_language == "all"`, downloads all uncached repos for all curated languages. Does NOT include custom repos. Mirrors passage behavior where "Download Passages Now" downloads all uncached books.
**`start_code_downloads()`**: Queues all uncached repos for the currently selected language. Used by intro screen "confirm" flow when downloads are enabled.
### Phase 1 Verification
1. `cargo build` -- compiles
2. `cargo test` -- all existing tests pass, plus new tests:
- `test_languages_with_content_includes_builtin` -- verifies built-in languages appear in `languages_with_content()` even with empty cache dir
- `test_languages_with_content_excludes_uncached_network_only` -- verifies network-only languages without cache are not returned
- `test_config_serde_defaults` -- verifies new config fields deserialize with correct defaults from empty/old configs
- `test_raw_string_snippets_preserved` -- spot-check that raw string conversion didn't alter snippet content
3. `cargo build --no-default-features` -- compiles, network features gated
4. Manual tests:
- Menu → Code Drill → language select → first time shows CodeIntro
- CodeIntro with downloads off → confirms → starts drill with built-in snippets
- CodeIntro with downloads on → confirms → shows CodeDownloadProgress → downloads repo → starts drill with downloaded content
- Subsequent code drills skip onboarding
- "all" language mode only picks from languages with content (never triggers download)
- Settings shows code drill fields, values persist on restart
- Passage drill flow completely unchanged
- Esc during download progress → returns to menu, no crash
---
## Phase 2: Language Expansion and Extraction Improvements
Goal: Add 8 more built-in languages and ~18 network-only languages, improve snippet extraction.
### Step 2.1: Add 8 built-in language snippet sets
**File**: `src/generator/code_syntax.rs`
Add ~10-15 raw-string snippets each for: **typescript, java, c, cpp, ruby, swift, bash, lua**
Language keys: `typescript`/`ts`, `java`, `c`, `cpp`, `ruby`, `swift`, `bash` (display: "Shell/Bash"), `lua`
All with idiomatic whitespace:
- TypeScript: 4-space indent
- Java: 4-space indent
- C: 4-space indent
- C++: 4-space indent
- Ruby: 2-space indent
- Swift: 4-space indent
- Bash: 2-space indent (common convention)
- Lua: 2-space indent
Update `get_snippets()` match to include all 12 languages.
### Step 2.2: Expand language registry to ~30 languages
**File**: `src/generator/code_syntax.rs`
Add ~18 network-only entries to `CODE_LANGUAGES` with curated repos:
kotlin, scala, haskell, elixir, clojure, perl, php, r, dart, zig, nim, ocaml, erlang, julia, objective-c, groovy, csharp, fsharp
Each gets 2-3 repos with specific raw.githubusercontent.com file URLs. **Exclude SQL and CSS** -- their syntax is too different from procedural code for function-level extraction to work well.
This is a significant data curation subtask: for each language, identify 2-3 well-known repos with permissive licenses (MIT/Apache/BSD), select 2-5 representative source files per repo with functions/methods to extract.
**Acceptance threshold**: Each language must yield at least 10 extractable snippets from its curated repos (verified by running `extract_code_snippets` against fetched files). Languages that fall below this threshold should be dropped from the registry rather than shipped with poor content.
### Step 2.3: Improve snippet extraction
**File**: `src/generator/code_syntax.rs`
Add a `func_start_patterns` field to `CodeLanguage`:
```rust
pub struct CodeLanguage {
// ... existing fields ...
pub block_style: BlockStyle,
}
pub enum BlockStyle {
Braces(&'static [&'static str]), // fn/def/func patterns, brace-delimited (C, Java, Go, etc.)
Indentation(&'static [&'static str]), // def/class patterns, indentation-delimited (Python)
EndDelimited(&'static [&'static str]), // def/class patterns, closed by `end` keyword (Ruby, Lua, Elixir)
}
```
Update `extract_code_snippets()` to accept `BlockStyle`:
- `Braces`: current behavior with configurable start patterns (C, Java, Go, JS, etc.)
- `Indentation`: track indent level changes to find block boundaries (Python only)
- `EndDelimited`: scan for matching `end` keyword at same indent level to close blocks (Ruby, Lua, Elixir)
Language-specific patterns:
- Java: `["public ", "private ", "protected ", "static ", "class ", "interface "]`
- Ruby: `["def ", "class ", "module "]` (EndDelimited style -- uses `end` keyword to close blocks)
- C/C++: `["int ", "void ", "char ", "float ", "double ", "struct ", "class ", "template"]`
- Swift: `["func ", "class ", "struct ", "enum ", "protocol "]`
- Bash: `["function ", "() {"]` (Braces style, simple)
- etc.
### Step 2.4: Make language select scrollable
**File**: `src/main.rs`
With 30+ languages, the selection screen needs scrolling. Add `code_language_scroll: usize` to `App`. Show a viewport of ~15 items. Add keybindings:
- Up/Down: navigate
- PageUp/PageDown: jump 10 items
- Home/End or `g`/`G`: jump to top/bottom
- `/`: type-to-filter (optional, nice-to-have)
Mark each language as "(built-in)" or "(download required)" in the list.
### Phase 2 Verification
1. `cargo build && cargo test`
2. Manual: verify all 12 built-in languages produce readable snippets with correct indentation
3. Manual: select a network-only language → triggers download → produces good snippets
4. Manual: scrollable language list works, indicators are accurate
5. Verify each built-in language's snippet whitespace is idiomatic
---
## Phase 3: Custom Repo Support
Goal: Let users specify their own GitHub repos to train on.
### Step 3.1: Design custom repo fetch strategy
Custom repos require solving problems that curated repos don't have:
- **Branch discovery**: Use GitHub API `GET /repos/{owner}/{repo}` to find `default_branch`. Requires `User-Agent` header (GitHub rejects requests without it; use `"keydr/{version}"`). Optionally support a `GITHUB_TOKEN` env var for authenticated requests (raises rate limit from 60 to 5000 req/hour).
- **File discovery**: Use GitHub API `GET /repos/{owner}/{repo}/git/trees/{branch}?recursive=1` to list all files, filter by language extensions. Same `User-Agent` and optional auth headers. If the response has `"truncated": true` (repos with >100k files), reject with a user-facing error: "Repository is too large for automatic file discovery. Please use a smaller repo or fork with fewer files."
- **Rate limiting**: Cache the tree response to disk. On 403/429 responses, show error: "GitHub API rate limit reached. Try again later or set GITHUB_TOKEN env var for higher limits."
- **File selection**: From matching files, randomly select 3-5 files to download via raw.githubusercontent.com (no API needed for file content)
- **Language detection**: Match file extensions against `CodeLanguage.extensions` field. If ambiguous or no match, prompt user.
- **All API requests**: Set `Accept: application/vnd.github.v3+json` header, timeout 10s.
### Step 3.2: Add config field and validation
**File**: `src/config.rs`
```rust
#[serde(default)]
pub code_custom_repos: Vec<String>, // Format: "owner/repo" or "owner/repo@language"
```
Parse function:
```rust
pub fn parse_custom_repo(input: &str) -> Option<CustomRepo> {
// Accepts: "owner/repo", "owner/repo@language", "https://github.com/owner/repo"
// Validates: owner and repo contain only valid GitHub chars
// Returns None on invalid input
}
```
### Step 3.3: Settings UI for custom repos
Add a settings section showing current custom repos as a scrollable list. Keybindings:
- `a`: add new repo (enters text input mode)
- `d`/`x`: delete selected repo
- Up/Down: navigate list
### Step 3.4: Code language select "Add custom repo" option
At the bottom of the language select list, add an "[ + Add custom repo ]" option. Selecting it enters a text input mode for `owner/repo`. On confirm:
1. Validate format
2. Add to `code_custom_repos` config
3. Auto-detect language from repo (via API tree listing file extensions)
4. If language ambiguous, show a small picker
5. Queue download of that repo
### Step 3.5: Integrate custom repos into download flow
When `start_code_drill()` runs for a language, include matching custom repos in the download candidates alongside curated repos.
### Phase 3 Verification
1. Add a custom repo → appears in settings list
2. Start drill → custom repo snippets appear
3. Invalid repo format → shows error, doesn't save
4. GitHub rate limit → shows informative error
5. Remove custom repo → removed from config and future drills
---
## Critical Files Summary
| File | Phase | Changes |
|------|-------|---------|
| `src/generator/github_code.rs` | 1 | Delete |
| `src/generator/mod.rs` | 1 | Remove github_code module |
| `src/generator/code_syntax.rs` | 1, 2 | Raw strings, new constructor, remove blocking fetch, language registry, download fn, new snippet sets, improved extraction |
| `src/config.rs` | 1, 3 | New code drill config fields, validation |
| `src/app.rs` | 1 | DownloadJob rename, new screens/state/flow methods, CodeDownloadCompleteAction |
| `src/main.rs` | 1, 2 | New handlers/renderers, updated settings, scrollable language list |
| `src/generator/cache.rs` | 1 | No changes (reuse existing `fetch_url_bytes_with_progress`) |
## Existing Code to Reuse
- `generator::cache::fetch_url_bytes_with_progress` -- already handles progress callbacks, used for passage downloads
- `generator::cache::DiskCache` -- NOT reused for code cache (no listing API); use direct `fs::read_dir` + `fs::read_to_string` instead
- `PassageDownloadJob` pattern (atomics + thread) -- generalized into `DownloadJob`
- `passage::extract_paragraphs` pattern -- referenced for extraction design but not directly reused
- `passage::download_book_to_cache_with_progress` -- structural template for `download_code_repo_to_cache_with_progress`
---
## Phase 2.5: Improve Snippet Extraction Quality
### Context
After Phase 2, the verification test (`test_verify_repo_urls`) shows many languages producing far fewer than 100 snippets. Root causes:
1. **Per-file cap of 50** in `extract_code_snippets()` (line 1869) limits output even from large source files
2. **Keyword-only matching** — extraction only starts when a line begins with a recognized keyword (e.g. `fn `, `def `, `class `). Many valid code blocks (anonymous functions, method chains, match arms, closures, etc.) are missed.
3. **Narrow keyword lists** — some languages are missing patterns for common constructs (e.g. `macro_rules!` in Rust, `@interface` in Objective-C)
4. **`code_snippets_per_repo` default of 50** caps total output per download
### Goal
Get every language to produce 100+ snippets from its curated repos, without sacrificing snippet quality. Do this by:
1. Widening keyword patterns to capture more language constructs
2. Adding a structural fallback that extracts well-formed code blocks by structure when keywords alone don't find enough
3. Raising the per-file and per-repo snippet caps
### Step 2.5.1: Raise snippet caps
**File**: `src/generator/code_syntax.rs`
Change `snippets.truncate(50)``snippets.truncate(200)` in `extract_code_snippets()`.
**File**: `src/config.rs`
Change `default_code_snippets_per_repo()``200`.
### Step 2.5.2: Widen keyword patterns
**File**: `src/generator/code_syntax.rs`
Add missing start patterns to existing languages. These are patterns that should have been there from the start — they represent common, well-defined constructs that produce good typing drill snippets:
| Language | Add patterns |
|----------|-------------|
| Rust | `"macro_rules! "`, `"mod "`, `"const "`, `"static "`, `"type "` |
| Python | `"async def "` is already there. Add `"@"` (decorators start blocks) |
| JavaScript | `"class "`, `"const "`, `"let "`, `"export "` |
| Go | No changes needed (already has `"func "`, `"type "`) |
| TypeScript | `"class "`, `"const "`, `"let "`, `"export "`, `"interface "` |
| Java | `"abstract "`, `"final "`, `"@"` (annotations start blocks) |
| C | `"typedef "`, `"#define "`, `"enum "` |
| C++ | `"namespace "`, `"typedef "`, `"#define "`, `"enum "`, `"constexpr "`, `"auto "` |
| Ruby | Add `"attr_"`, `"scope "`, `"describe "`, `"it "` |
| Swift | `"var "`, `"let "`, `"init("`, `"deinit "`, `"extension "`, `"typealias "` |
| Bash | `"if "`, `"for "`, `"while "`, `"case "` |
| Kotlin | `"override fun "` already there. Add `"val "`, `"var "`, `"enum "`, `"annotation "`, `"typealias "` |
| Scala | `"val "`, `"var "`, `"type "`, `"implicit "`, `"given "`, `"extension "` |
| PHP | `"class "`, `"interface "`, `"trait "`, `"enum "` |
| Dart | Add `"Widget "`, `"get "`, `"set "`, `"enum "`, `"typedef "`, `"extension "` |
| Elixir | `"defmacro "`, `"defstruct"`, `"defprotocol "`, `"defimpl "` |
| Zig | `"test "`, `"var "` |
| Haskell | Already broad. No changes. |
| Objective-C | `"@interface "`, `"@implementation "`, `"@protocol "`, `"typedef "` |
| Others | Review on a case-by-case basis during implementation |
### Step 2.5.3: Add structural fallback extraction
**File**: `src/generator/code_syntax.rs`
When keyword-based extraction yields fewer than 20 snippets from a file, run a second pass that extracts code blocks purely by structure. This captures anonymous functions, nested blocks, and other constructs that don't start with recognized keywords.
#### Design
Add a `structural_fallback: bool` field to each `BlockStyle` variant:
```rust
pub enum BlockStyle {
Braces {
patterns: &'static [&'static str],
structural_fallback: bool,
},
Indentation {
patterns: &'static [&'static str],
structural_fallback: bool,
},
EndDelimited {
patterns: &'static [&'static str],
structural_fallback: bool,
},
}
```
Set `structural_fallback: true` for all languages. This can be disabled per-language if it produces poor results.
Update `extract_code_snippets()`:
```rust
pub fn extract_code_snippets(source: &str, block_style: &BlockStyle) -> Vec<String> {
let mut snippets = keyword_extract(source, block_style);
if snippets.len() < 20 && has_structural_fallback(block_style) {
let structural = structural_extract(source, block_style);
// Add structural snippets that don't overlap with keyword ones
for s in structural {
if !snippets.contains(&s) {
snippets.push(s);
}
}
}
snippets.truncate(200);
snippets
}
```
#### Structural extraction for Braces languages
`structural_extract_braces(source)`:
1. Scan for lines containing `{` where brace depth transitions from 0→1 or 1→2
2. Capture from that line until depth returns to its starting level
3. Apply the same quality filters: 3-30 lines, 20+ non-whitespace chars, ≤800 bytes
4. Skip noise blocks: reject snippets where first non-blank line is only `{`, or where the block is just imports/use statements
#### Structural extraction for Indentation languages
`structural_extract_indent(source)`:
1. Scan for non-blank lines at indentation level 0 (top-level) that are followed by indented lines
2. Capture the top-level line + all subsequent lines with greater indentation
3. Apply same quality filters
4. Skip noise: reject if all body lines are `import`/`from`/`use`/`#include` statements
#### Structural extraction for EndDelimited languages
`structural_extract_end(source)`:
1. Scan for lines at top-level indentation followed by indented body ending with `end`
2. Same quality filters and noise rejection
#### Noise filtering
A snippet is "noise" and should be rejected if:
- First meaningful line (after stripping comments) is just `{` or `}`
- Body consists entirely of `import`, `use`, `from`, `require`, `include`, or blank lines
- It's a single-statement block (only 1 non-blank body line after the opening)
### Step 2.5.4: Add more source URLs for low-count languages
After implementing the extraction improvements, re-run `test_verify_repo_urls` to identify languages still under 100 snippets. For those, add 1-2 more source file URLs from the same or new repos to increase raw material.
This step is intentionally deferred until after extraction improvements, since better extraction may push many languages over the 100 threshold without needing more URLs.
### Phase 2.5 Verification
1. `cargo test` — all existing tests pass
2. Run `cargo test test_verify_repo_urls -- --ignored --nocapture` — verify all 30 languages produce 50+ snippets (ideally 100+)
3. Spot-check structural fallback snippets for 3-4 languages — verify they contain real code, not just import blocks or noise
4. `cargo build --no-default-features` — compiles without network features
5. Verify no change to built-in snippet behavior (built-in snippets don't go through extraction)

View File

@@ -16,7 +16,11 @@ use crate::engine::skill_tree::{BranchId, BranchStatus, DrillScope, SkillTree};
use crate::generator::TextGenerator; use crate::generator::TextGenerator;
use crate::generator::capitalize; use crate::generator::capitalize;
use crate::generator::code_patterns; use crate::generator::code_patterns;
use crate::generator::code_syntax::CodeSyntaxGenerator; use crate::generator::code_syntax::{
CodeSyntaxGenerator, build_code_download_queue, code_language_options,
download_code_repo_to_cache_with_progress, is_language_cached, language_by_key,
languages_with_content,
};
use crate::generator::dictionary::Dictionary; use crate::generator::dictionary::Dictionary;
use crate::generator::numbers; use crate::generator::numbers;
use crate::generator::passage::{ use crate::generator::passage::{
@@ -48,6 +52,8 @@ pub enum AppScreen {
PassageBookSelect, PassageBookSelect,
PassageIntro, PassageIntro,
PassageDownloadProgress, PassageDownloadProgress,
CodeIntro,
CodeDownloadProgress,
} }
#[derive(Clone, Copy, Debug, PartialEq, Eq)] #[derive(Clone, Copy, Debug, PartialEq, Eq)]
@@ -63,7 +69,13 @@ pub enum PassageDownloadCompleteAction {
ReturnToSettings, ReturnToSettings,
} }
struct PassageDownloadJob { #[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum CodeDownloadCompleteAction {
StartCodeDrill,
ReturnToSettings,
}
struct DownloadJob {
downloaded_bytes: Arc<AtomicU64>, downloaded_bytes: Arc<AtomicU64>,
total_bytes: Arc<AtomicU64>, total_bytes: Arc<AtomicU64>,
done: Arc<AtomicBool>, done: Arc<AtomicBool>,
@@ -112,6 +124,7 @@ pub struct App {
pub skill_tree_detail_scroll: usize, pub skill_tree_detail_scroll: usize,
pub drill_source_info: Option<String>, pub drill_source_info: Option<String>,
pub code_language_selected: usize, pub code_language_selected: usize,
pub code_language_scroll: usize,
pub passage_book_selected: usize, pub passage_book_selected: usize,
pub passage_intro_selected: usize, pub passage_intro_selected: usize,
pub passage_intro_downloads_enabled: bool, pub passage_intro_downloads_enabled: bool,
@@ -126,18 +139,37 @@ pub struct App {
pub passage_download_queue: Vec<usize>, pub passage_download_queue: Vec<usize>,
pub passage_drill_selection_override: Option<String>, pub passage_drill_selection_override: Option<String>,
pub passage_download_action: PassageDownloadCompleteAction, pub passage_download_action: PassageDownloadCompleteAction,
pub code_intro_selected: usize,
pub code_intro_downloads_enabled: bool,
pub code_intro_download_dir: String,
pub code_intro_snippets_per_repo: usize,
pub code_intro_downloading: bool,
pub code_intro_download_total: usize,
pub code_intro_downloaded: usize,
pub code_intro_current_repo: String,
pub code_intro_download_bytes: u64,
pub code_intro_download_bytes_total: u64,
pub code_download_queue: Vec<(String, usize)>,
pub code_drill_language_override: Option<String>,
pub code_download_attempted: bool,
pub code_download_action: CodeDownloadCompleteAction,
pub shift_held: bool, pub shift_held: bool,
pub keyboard_model: KeyboardModel, pub keyboard_model: KeyboardModel,
rng: SmallRng, rng: SmallRng,
transition_table: TransitionTable, transition_table: TransitionTable,
#[allow(dead_code)] #[allow(dead_code)]
dictionary: Dictionary, dictionary: Dictionary,
passage_download_job: Option<PassageDownloadJob>, passage_download_job: Option<DownloadJob>,
code_download_job: Option<DownloadJob>,
} }
impl App { impl App {
pub fn new() -> Self { pub fn new() -> Self {
let config = Config::load().unwrap_or_default(); let mut config = Config::load().unwrap_or_default();
// Normalize code_language: reset to default if not a valid option
let valid_keys: Vec<&str> = code_language_options().iter().map(|(k, _)| *k).collect();
config.normalize_code_language(&valid_keys);
let loaded_theme = Theme::load(&config.theme).unwrap_or_default(); let loaded_theme = Theme::load(&config.theme).unwrap_or_default();
let theme: &'static Theme = Box::leak(Box::new(loaded_theme)); let theme: &'static Theme = Box::leak(Box::new(loaded_theme));
let menu = Menu::new(theme); let menu = Menu::new(theme);
@@ -183,6 +215,9 @@ impl App {
let intro_downloads_enabled = config.passage_downloads_enabled; let intro_downloads_enabled = config.passage_downloads_enabled;
let intro_download_dir = config.passage_download_dir.clone(); let intro_download_dir = config.passage_download_dir.clone();
let intro_paragraph_limit = config.passage_paragraphs_per_book; let intro_paragraph_limit = config.passage_paragraphs_per_book;
let code_intro_downloads_enabled = config.code_downloads_enabled;
let code_intro_download_dir = config.code_download_dir.clone();
let code_intro_snippets_per_repo = config.code_snippets_per_repo;
let mut app = Self { let mut app = Self {
screen: AppScreen::Menu, screen: AppScreen::Menu,
@@ -211,6 +246,7 @@ impl App {
skill_tree_detail_scroll: 0, skill_tree_detail_scroll: 0,
drill_source_info: None, drill_source_info: None,
code_language_selected: 0, code_language_selected: 0,
code_language_scroll: 0,
passage_book_selected: 0, passage_book_selected: 0,
passage_intro_selected: 0, passage_intro_selected: 0,
passage_intro_downloads_enabled: intro_downloads_enabled, passage_intro_downloads_enabled: intro_downloads_enabled,
@@ -225,12 +261,27 @@ impl App {
passage_download_queue: Vec::new(), passage_download_queue: Vec::new(),
passage_drill_selection_override: None, passage_drill_selection_override: None,
passage_download_action: PassageDownloadCompleteAction::StartPassageDrill, passage_download_action: PassageDownloadCompleteAction::StartPassageDrill,
code_intro_selected: 0,
code_intro_downloads_enabled,
code_intro_download_dir,
code_intro_snippets_per_repo,
code_intro_downloading: false,
code_intro_download_total: 0,
code_intro_downloaded: 0,
code_intro_current_repo: String::new(),
code_intro_download_bytes: 0,
code_intro_download_bytes_total: 0,
code_download_queue: Vec::new(),
code_drill_language_override: None,
code_download_attempted: false,
code_download_action: CodeDownloadCompleteAction::StartCodeDrill,
shift_held: false, shift_held: false,
keyboard_model, keyboard_model,
rng: SmallRng::from_entropy(), rng: SmallRng::from_entropy(),
transition_table, transition_table,
dictionary, dictionary,
passage_download_job: None, passage_download_job: None,
code_download_job: None,
}; };
app.start_drill(); app.start_drill();
app app
@@ -368,15 +419,17 @@ impl App {
} }
DrillMode::Code => { DrillMode::Code => {
let filter = CharFilter::new(('a'..='z').collect()); let filter = CharFilter::new(('a'..='z').collect());
let lang = if self.config.code_language == "all" { let lang = self
let langs = ["rust", "python", "javascript", "go"]; .code_drill_language_override
let idx = self.rng.gen_range(0..langs.len()); .clone()
langs[idx].to_string() .unwrap_or_else(|| self.config.code_language.clone());
} else {
self.config.code_language.clone()
};
let rng = SmallRng::from_rng(&mut self.rng).unwrap(); let rng = SmallRng::from_rng(&mut self.rng).unwrap();
let mut generator = CodeSyntaxGenerator::new(rng, &lang); let mut generator = CodeSyntaxGenerator::new(
rng,
&lang,
&self.config.code_download_dir,
);
self.code_drill_language_override = None;
let text = generator.generate(&filter, None, word_count); let text = generator.generate(&filter, None, word_count);
(text, Some(generator.last_source().to_string())) (text, Some(generator.last_source().to_string()))
} }
@@ -648,11 +701,13 @@ impl App {
} }
pub fn go_to_code_language_select(&mut self) { pub fn go_to_code_language_select(&mut self) {
let langs = ["rust", "python", "javascript", "go", "all"]; let options = code_language_options();
self.code_language_selected = langs self.code_language_selected = options
.iter() .iter()
.position(|&l| l == self.config.code_language) .position(|(k, _)| *k == self.config.code_language)
.unwrap_or(0); .unwrap_or(0);
// Center the selected item in the viewport (rough estimate of 15 visible rows)
self.code_language_scroll = self.code_language_selected.saturating_sub(7);
self.screen = AppScreen::CodeLanguageSelect; self.screen = AppScreen::CodeLanguageSelect;
} }
@@ -689,6 +744,215 @@ impl App {
self.screen = AppScreen::PassageIntro; self.screen = AppScreen::PassageIntro;
} }
pub fn go_to_code_intro(&mut self) {
self.code_intro_selected = 0;
self.code_intro_downloads_enabled = self.config.code_downloads_enabled;
self.code_intro_download_dir = self.config.code_download_dir.clone();
self.code_intro_snippets_per_repo = self.config.code_snippets_per_repo;
self.code_intro_downloading = false;
self.code_intro_download_total = 0;
self.code_intro_downloaded = 0;
self.code_intro_current_repo.clear();
self.code_intro_download_bytes = 0;
self.code_intro_download_bytes_total = 0;
self.code_download_queue.clear();
self.code_download_job = None;
self.code_download_action = CodeDownloadCompleteAction::StartCodeDrill;
self.code_download_attempted = false;
self.screen = AppScreen::CodeIntro;
}
pub fn start_code_drill(&mut self) {
// Step 1: Resolve concrete language (never download with "all" selected)
if self.code_drill_language_override.is_none() {
let chosen = if self.config.code_language == "all" {
let available = languages_with_content(&self.config.code_download_dir);
if available.is_empty() {
"rust".to_string()
} else {
let idx = self.rng.gen_range(0..available.len());
available[idx].to_string()
}
} else {
self.config.code_language.clone()
};
self.code_drill_language_override = Some(chosen);
}
let chosen = self.code_drill_language_override.clone().unwrap();
// Step 2: Check if we need to download (only if not already attempted)
if self.config.code_downloads_enabled
&& !self.code_download_attempted
&& !is_language_cached(&self.config.code_download_dir, &chosen)
{
if let Some(lang) = language_by_key(&chosen) {
if !lang.repos.is_empty() {
let repo_idx = self.rng.gen_range(0..lang.repos.len());
self.code_download_queue = vec![(chosen.clone(), repo_idx)];
self.code_intro_download_total = 1;
self.code_intro_downloaded = 0;
self.code_intro_downloading = true;
self.code_intro_current_repo = lang.repos[repo_idx].key.to_string();
self.code_download_action = CodeDownloadCompleteAction::StartCodeDrill;
self.code_download_job = None;
self.code_download_attempted = true;
self.screen = AppScreen::CodeDownloadProgress;
return;
}
}
}
// Step 3: If language has no built-in AND no cache → fallback
if !is_language_cached(&self.config.code_download_dir, &chosen) {
if let Some(lang) = language_by_key(&chosen) {
if !lang.has_builtin {
self.code_drill_language_override = Some("rust".to_string());
}
}
}
// Step 4: Start the drill
self.code_download_attempted = false;
self.drill_mode = DrillMode::Code;
self.drill_scope = DrillScope::Global;
self.start_drill();
}
pub fn start_code_downloads(&mut self) {
let queue = build_code_download_queue(
&self.config.code_language,
&self.code_intro_download_dir,
);
self.code_intro_download_total = queue.len();
self.code_download_queue = queue;
self.code_intro_downloaded = 0;
self.code_intro_downloading = self.code_intro_download_total > 0;
self.code_intro_download_bytes = 0;
self.code_intro_download_bytes_total = 0;
self.code_download_job = None;
}
pub fn start_code_downloads_from_settings(&mut self) {
self.go_to_code_intro();
self.code_download_action = CodeDownloadCompleteAction::ReturnToSettings;
self.start_code_downloads();
if !self.code_intro_downloading {
self.go_to_settings();
}
}
pub fn process_code_download_tick(&mut self) {
if !self.code_intro_downloading {
return;
}
if self.code_download_job.is_none() {
let Some((lang_key, repo_idx)) = self.code_download_queue.pop() else {
self.code_intro_downloading = false;
self.code_intro_current_repo.clear();
match self.code_download_action {
CodeDownloadCompleteAction::StartCodeDrill => self.start_code_drill(),
CodeDownloadCompleteAction::ReturnToSettings => self.go_to_settings(),
}
return;
};
self.spawn_code_download_job(&lang_key, repo_idx);
return;
}
let mut finished = false;
if let Some(job) = self.code_download_job.as_mut() {
self.code_intro_download_bytes = job.downloaded_bytes.load(Ordering::Relaxed);
self.code_intro_download_bytes_total = job.total_bytes.load(Ordering::Relaxed);
finished = job.done.load(Ordering::Relaxed);
}
if !finished {
return;
}
if let Some(mut job) = self.code_download_job.take() {
if let Some(handle) = job.handle.take() {
let _ = handle.join();
}
self.code_intro_downloaded = self.code_intro_downloaded.saturating_add(1);
}
if self.code_intro_downloaded >= self.code_intro_download_total {
self.code_intro_downloading = false;
self.code_intro_current_repo.clear();
self.code_intro_download_bytes = 0;
self.code_intro_download_bytes_total = 0;
match self.code_download_action {
CodeDownloadCompleteAction::StartCodeDrill => self.start_code_drill(),
CodeDownloadCompleteAction::ReturnToSettings => self.go_to_settings(),
}
}
}
fn spawn_code_download_job(&mut self, language_key: &str, repo_idx: usize) {
let Some(lang) = language_by_key(language_key) else {
return;
};
let Some(repo) = lang.repos.get(repo_idx) else {
return;
};
self.code_intro_current_repo = repo.key.to_string();
self.code_intro_download_bytes = 0;
self.code_intro_download_bytes_total = 0;
let downloaded_bytes = Arc::new(AtomicU64::new(0));
let total_bytes = Arc::new(AtomicU64::new(0));
let done = Arc::new(AtomicBool::new(false));
let success = Arc::new(AtomicBool::new(false));
let dl_clone = Arc::clone(&downloaded_bytes);
let total_clone = Arc::clone(&total_bytes);
let done_clone = Arc::clone(&done);
let success_clone = Arc::clone(&success);
let cache_dir = self.code_intro_download_dir.clone();
let lang_key = language_key.to_string();
let snippets_limit = self.code_intro_snippets_per_repo;
// Get static references for thread
let repo_ref: &'static crate::generator::code_syntax::CodeRepo =
&lang.repos[repo_idx];
let block_style_ref: &'static crate::generator::code_syntax::BlockStyle =
&lang.block_style;
let handle = thread::spawn(move || {
let ok = download_code_repo_to_cache_with_progress(
&cache_dir,
&lang_key,
repo_ref,
block_style_ref,
snippets_limit,
|downloaded, total| {
dl_clone.store(downloaded, Ordering::Relaxed);
if let Some(total) = total {
total_clone.store(total, Ordering::Relaxed);
}
},
);
success_clone.store(ok, Ordering::Relaxed);
done_clone.store(true, Ordering::Relaxed);
});
self.code_download_job = Some(DownloadJob {
downloaded_bytes,
total_bytes,
done,
success,
handle: Some(handle),
});
}
pub fn start_passage_drill(&mut self) { pub fn start_passage_drill(&mut self) {
// Lazy source selection: choose a specific source for this drill and // Lazy source selection: choose a specific source for this drill and
// download exactly one missing book when needed. // download exactly one missing book when needed.
@@ -765,6 +1029,14 @@ impl App {
self.passage_download_job = None; self.passage_download_job = None;
} }
pub fn cancel_code_download(&mut self) {
self.code_download_queue.clear();
self.code_intro_downloading = false;
self.code_download_job = None;
self.code_drill_language_override = None;
self.code_download_attempted = false;
}
pub fn start_passage_downloads_from_settings(&mut self) { pub fn start_passage_downloads_from_settings(&mut self) {
self.go_to_passage_intro(); self.go_to_passage_intro();
self.passage_download_action = PassageDownloadCompleteAction::ReturnToSettings; self.passage_download_action = PassageDownloadCompleteAction::ReturnToSettings;
@@ -867,7 +1139,7 @@ impl App {
done_clone.store(true, Ordering::Relaxed); done_clone.store(true, Ordering::Relaxed);
}); });
self.passage_download_job = Some(PassageDownloadJob { self.passage_download_job = Some(DownloadJob {
downloaded_bytes, downloaded_bytes,
total_bytes, total_bytes,
done, done,
@@ -900,21 +1172,37 @@ impl App {
self.config.word_count = (self.config.word_count + 5).min(100); self.config.word_count = (self.config.word_count + 5).min(100);
} }
3 => { 3 => {
let langs = ["rust", "python", "javascript", "go", "all"]; let options = code_language_options();
let idx = langs let keys: Vec<&str> = options.iter().map(|(k, _)| *k).collect();
let idx = keys
.iter() .iter()
.position(|&l| l == self.config.code_language) .position(|&l| l == self.config.code_language)
.unwrap_or(0); .unwrap_or(0);
let next = (idx + 1) % langs.len(); let next = (idx + 1) % keys.len();
self.config.code_language = langs[next].to_string(); self.config.code_language = keys[next].to_string();
} }
4 => { 4 => {
self.config.passage_downloads_enabled = !self.config.passage_downloads_enabled; self.config.code_downloads_enabled = !self.config.code_downloads_enabled;
} }
5 => { 5 => {
// Editable text field handled directly in key handler. // Editable text field handled directly in key handler.
} }
6 => { 6 => {
self.config.code_snippets_per_repo =
match self.config.code_snippets_per_repo {
0 => 1,
n if n >= 200 => 0,
n => n + 10,
};
}
// 7 = Download Code Now (action button)
8 => {
self.config.passage_downloads_enabled = !self.config.passage_downloads_enabled;
}
9 => {
// Passage download dir - editable text field handled directly in key handler.
}
10 => {
self.config.passage_paragraphs_per_book = self.config.passage_paragraphs_per_book =
match self.config.passage_paragraphs_per_book { match self.config.passage_paragraphs_per_book {
0 => 1, 0 => 1,
@@ -950,21 +1238,37 @@ impl App {
self.config.word_count = self.config.word_count.saturating_sub(5).max(5); self.config.word_count = self.config.word_count.saturating_sub(5).max(5);
} }
3 => { 3 => {
let langs = ["rust", "python", "javascript", "go", "all"]; let options = code_language_options();
let idx = langs let keys: Vec<&str> = options.iter().map(|(k, _)| *k).collect();
let idx = keys
.iter() .iter()
.position(|&l| l == self.config.code_language) .position(|&l| l == self.config.code_language)
.unwrap_or(0); .unwrap_or(0);
let next = if idx == 0 { langs.len() - 1 } else { idx - 1 }; let next = if idx == 0 { keys.len() - 1 } else { idx - 1 };
self.config.code_language = langs[next].to_string(); self.config.code_language = keys[next].to_string();
} }
4 => { 4 => {
self.config.passage_downloads_enabled = !self.config.passage_downloads_enabled; self.config.code_downloads_enabled = !self.config.code_downloads_enabled;
} }
5 => { 5 => {
// Editable text field handled directly in key handler. // Editable text field handled directly in key handler.
} }
6 => { 6 => {
self.config.code_snippets_per_repo =
match self.config.code_snippets_per_repo {
0 => 200,
1 => 0,
n => n.saturating_sub(10).max(1),
};
}
// 7 = Download Code Now (action button)
8 => {
self.config.passage_downloads_enabled = !self.config.passage_downloads_enabled;
}
9 => {
// Passage download dir - editable text field handled directly in key handler.
}
10 => {
self.config.passage_paragraphs_per_book = self.config.passage_paragraphs_per_book =
match self.config.passage_paragraphs_per_book { match self.config.passage_paragraphs_per_book {
0 => 500, 0 => 500,

View File

@@ -26,6 +26,14 @@ pub struct Config {
pub passage_paragraphs_per_book: usize, pub passage_paragraphs_per_book: usize,
#[serde(default = "default_passage_onboarding_done")] #[serde(default = "default_passage_onboarding_done")]
pub passage_onboarding_done: bool, pub passage_onboarding_done: bool,
#[serde(default = "default_code_downloads_enabled")]
pub code_downloads_enabled: bool,
#[serde(default = "default_code_download_dir")]
pub code_download_dir: String,
#[serde(default = "default_code_snippets_per_repo")]
pub code_snippets_per_repo: usize,
#[serde(default = "default_code_onboarding_done")]
pub code_onboarding_done: bool,
} }
fn default_target_wpm() -> u32 { fn default_target_wpm() -> u32 {
@@ -63,6 +71,23 @@ fn default_passage_paragraphs_per_book() -> usize {
fn default_passage_onboarding_done() -> bool { fn default_passage_onboarding_done() -> bool {
false false
} }
fn default_code_downloads_enabled() -> bool {
false
}
fn default_code_download_dir() -> String {
dirs::data_dir()
.unwrap_or_else(|| PathBuf::from("."))
.join("keydr")
.join("code")
.to_string_lossy()
.to_string()
}
fn default_code_snippets_per_repo() -> usize {
200
}
fn default_code_onboarding_done() -> bool {
false
}
impl Default for Config { impl Default for Config {
fn default() -> Self { fn default() -> Self {
@@ -77,6 +102,10 @@ impl Default for Config {
passage_download_dir: default_passage_download_dir(), passage_download_dir: default_passage_download_dir(),
passage_paragraphs_per_book: default_passage_paragraphs_per_book(), passage_paragraphs_per_book: default_passage_paragraphs_per_book(),
passage_onboarding_done: default_passage_onboarding_done(), passage_onboarding_done: default_passage_onboarding_done(),
code_downloads_enabled: default_code_downloads_enabled(),
code_download_dir: default_code_download_dir(),
code_snippets_per_repo: default_code_snippets_per_repo(),
code_onboarding_done: default_code_onboarding_done(),
} }
} }
} }
@@ -114,4 +143,97 @@ impl Config {
pub fn target_cpm(&self) -> f64 { pub fn target_cpm(&self) -> f64 {
self.target_wpm as f64 * 5.0 self.target_wpm as f64 * 5.0
} }
/// Validate `code_language` against known options, resetting to default if invalid.
/// Call after deserialization to handle stale/renamed keys from old configs.
pub fn normalize_code_language(&mut self, valid_keys: &[&str]) {
// Backwards compatibility: old "shell" key is now "bash".
if self.code_language == "shell" {
self.code_language = "bash".to_string();
}
if !valid_keys.contains(&self.code_language.as_str()) {
self.code_language = default_code_language();
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_config_serde_defaults_from_empty() {
// Simulates loading an old config file with no code drill fields
let config: Config = toml::from_str("").unwrap();
assert_eq!(config.code_downloads_enabled, false);
assert_eq!(config.code_snippets_per_repo, 200);
assert_eq!(config.code_onboarding_done, false);
assert!(!config.code_download_dir.is_empty());
assert!(config.code_download_dir.contains("code"));
}
#[test]
fn test_config_serde_defaults_from_old_fields_only() {
// Simulates a config file that only has pre-existing fields
let toml_str = r#"
target_wpm = 60
theme = "monokai"
code_language = "go"
"#;
let config: Config = toml::from_str(toml_str).unwrap();
assert_eq!(config.target_wpm, 60);
assert_eq!(config.theme, "monokai");
assert_eq!(config.code_language, "go");
// New fields should have defaults
assert_eq!(config.code_downloads_enabled, false);
assert_eq!(config.code_snippets_per_repo, 200);
assert_eq!(config.code_onboarding_done, false);
}
#[test]
fn test_config_serde_roundtrip() {
let config = Config::default();
let serialized = toml::to_string_pretty(&config).unwrap();
let deserialized: Config = toml::from_str(&serialized).unwrap();
assert_eq!(config.code_downloads_enabled, deserialized.code_downloads_enabled);
assert_eq!(config.code_download_dir, deserialized.code_download_dir);
assert_eq!(config.code_snippets_per_repo, deserialized.code_snippets_per_repo);
assert_eq!(config.code_onboarding_done, deserialized.code_onboarding_done);
}
#[test]
fn test_normalize_code_language_valid_key_unchanged() {
let mut config = Config::default();
config.code_language = "python".to_string();
let valid_keys = vec!["rust", "python", "javascript", "go", "all"];
config.normalize_code_language(&valid_keys);
assert_eq!(config.code_language, "python");
}
#[test]
fn test_normalize_code_language_invalid_key_resets() {
let mut config = Config::default();
config.code_language = "haskell".to_string();
let valid_keys = vec!["rust", "python", "javascript", "go", "all"];
config.normalize_code_language(&valid_keys);
assert_eq!(config.code_language, "rust");
}
#[test]
fn test_normalize_code_language_empty_string_resets() {
let mut config = Config::default();
config.code_language = String::new();
let valid_keys = vec!["rust", "python", "javascript", "go", "all"];
config.normalize_code_language(&valid_keys);
assert_eq!(config.code_language, "rust");
}
#[test]
fn test_normalize_code_language_shell_maps_to_bash() {
let mut config = Config::default();
config.code_language = "shell".to_string();
let valid_keys = vec!["rust", "python", "javascript", "go", "bash", "all"];
config.normalize_code_language(&valid_keys);
assert_eq!(config.code_language, "bash");
}
} }

View File

@@ -3,10 +3,12 @@ use std::fs;
use std::io::Read; use std::io::Read;
use std::path::PathBuf; use std::path::PathBuf;
#[allow(dead_code)]
pub struct DiskCache { pub struct DiskCache {
base_dir: PathBuf, base_dir: PathBuf,
} }
#[allow(dead_code)]
impl DiskCache { impl DiskCache {
pub fn new(subdir: &str) -> Option<Self> { pub fn new(subdir: &str) -> Option<Self> {
let base = dirs::data_dir()?.join("keydr").join(subdir); let base = dirs::data_dir()?.join("keydr").join(subdir);
@@ -37,6 +39,7 @@ impl DiskCache {
} }
} }
#[allow(dead_code)]
#[cfg(feature = "network")] #[cfg(feature = "network")]
pub fn fetch_url(url: &str) -> Option<String> { pub fn fetch_url(url: &str) -> Option<String> {
let client = reqwest::blocking::Client::builder() let client = reqwest::blocking::Client::builder()
@@ -51,6 +54,7 @@ pub fn fetch_url(url: &str) -> Option<String> {
} }
} }
#[allow(dead_code)]
#[cfg(not(feature = "network"))] #[cfg(not(feature = "network"))]
pub fn fetch_url(_url: &str) -> Option<String> { pub fn fetch_url(_url: &str) -> Option<String> {
None None

File diff suppressed because it is too large Load Diff

View File

@@ -1,41 +0,0 @@
use crate::engine::filter::CharFilter;
use crate::generator::TextGenerator;
#[allow(dead_code)]
pub struct GitHubCodeGenerator {
cached_snippets: Vec<String>,
current_idx: usize,
}
impl GitHubCodeGenerator {
#[allow(dead_code)]
pub fn new() -> Self {
Self {
cached_snippets: Vec::new(),
current_idx: 0,
}
}
}
impl Default for GitHubCodeGenerator {
fn default() -> Self {
Self::new()
}
}
impl TextGenerator for GitHubCodeGenerator {
fn generate(
&mut self,
_filter: &CharFilter,
_focused: Option<char>,
_word_count: usize,
) -> String {
if self.cached_snippets.is_empty() {
return "// GitHub code fetching not yet configured. Use settings to add a repository."
.to_string();
}
let snippet = self.cached_snippets[self.current_idx % self.cached_snippets.len()].clone();
self.current_idx += 1;
snippet
}
}

View File

@@ -3,7 +3,6 @@ pub mod capitalize;
pub mod code_patterns; pub mod code_patterns;
pub mod code_syntax; pub mod code_syntax;
pub mod dictionary; pub mod dictionary;
pub mod github_code;
pub mod numbers; pub mod numbers;
pub mod passage; pub mod passage;
pub mod phonetic; pub mod phonetic;

File diff suppressed because it is too large Load Diff

View File

@@ -218,6 +218,51 @@ mod tests {
assert_eq!(drill.typo_count(), 1); assert_eq!(drill.typo_count(), 1);
} }
#[test]
fn test_tab_counts_as_four_spaces() {
let mut drill = DrillState::new(" pass");
let start = drill.cursor;
input::process_char(&mut drill, '\t');
assert_eq!(drill.cursor, start + 4);
assert_eq!(drill.typo_count(), 0);
}
#[test]
fn test_tab_counts_as_two_spaces() {
let mut drill = DrillState::new(" echo");
let start = drill.cursor;
input::process_char(&mut drill, '\t');
assert_eq!(drill.cursor, start + 2);
assert_eq!(drill.typo_count(), 0);
}
#[test]
fn test_tab_not_accepted_for_non_four_space_prefix() {
let mut drill = DrillState::new("abc def");
for ch in "abc".chars() {
input::process_char(&mut drill, ch);
}
let start = drill.cursor;
input::process_char(&mut drill, '\t');
// Falls back to synthetic incorrect span behavior.
assert!(drill.cursor > start);
assert!(drill.typo_count() >= 1);
}
#[test]
fn test_correct_enter_auto_indents_next_line() {
let mut drill = DrillState::new("if x:\n pass");
for ch in "if x:".chars() {
input::process_char(&mut drill, ch);
}
// Correct newline should also consume the 4-space indent.
input::process_char(&mut drill, '\n');
let expected_cursor = "if x:\n ".chars().count();
assert_eq!(drill.cursor, expected_cursor);
assert_eq!(drill.typo_count(), 0);
assert_eq!(drill.accuracy(), 100.0);
}
#[test] #[test]
fn test_nested_synthetic_spans_collapse_to_single_error() { fn test_nested_synthetic_spans_collapse_to_single_error() {
let mut drill = DrillState::new("abcd\nefgh"); let mut drill = DrillState::new("abcd\nefgh");

View File

@@ -27,7 +27,13 @@ pub fn process_char(drill: &mut DrillState, ch: char) -> Option<KeystrokeEvent>
} }
let expected = drill.target[drill.cursor]; let expected = drill.target[drill.cursor];
let correct = ch == expected; let tab_indent_len = if ch == '\t' {
tab_indent_completion_len(drill)
} else {
0
};
let tab_as_indent = tab_indent_len > 0;
let correct = ch == expected || tab_as_indent;
let event = KeystrokeEvent { let event = KeystrokeEvent {
expected, expected,
@@ -36,9 +42,16 @@ pub fn process_char(drill: &mut DrillState, ch: char) -> Option<KeystrokeEvent>
correct, correct,
}; };
if correct { if tab_as_indent {
apply_tab_indent(drill, tab_indent_len);
} else if correct {
drill.input.push(CharStatus::Correct); drill.input.push(CharStatus::Correct);
drill.cursor += 1; drill.cursor += 1;
// IDE-like behavior: when Enter is correctly typed, auto-consume
// indentation whitespace on the next line.
if ch == '\n' {
apply_auto_indent_after_newline(drill);
}
} else if ch == '\n' { } else if ch == '\n' {
apply_newline_span(drill, ch); apply_newline_span(drill, ch);
} else if ch == '\t' { } else if ch == '\t' {
@@ -56,6 +69,63 @@ pub fn process_char(drill: &mut DrillState, ch: char) -> Option<KeystrokeEvent>
Some(event) Some(event)
} }
fn tab_indent_completion_len(drill: &DrillState) -> usize {
if drill.cursor >= drill.target.len() {
return 0;
}
// Only treat Tab as indentation if cursor is in leading whitespace
// for the current line.
let line_start = drill.target[..drill.cursor]
.iter()
.rposition(|&c| c == '\n')
.map(|idx| idx + 1)
.unwrap_or(0);
if drill.target[line_start..drill.cursor]
.iter()
.any(|&c| c != ' ' && c != '\t')
{
return 0;
}
let line_end = drill.target[drill.cursor..]
.iter()
.position(|&c| c == '\n')
.map(|offset| drill.cursor + offset)
.unwrap_or(drill.target.len());
let mut end = drill.cursor;
while end < line_end {
let c = drill.target[end];
if c == ' ' || c == '\t' {
end += 1;
} else {
break;
}
}
end.saturating_sub(drill.cursor)
}
fn apply_tab_indent(drill: &mut DrillState, len: usize) {
for _ in 0..len {
drill.input.push(CharStatus::Correct);
}
drill.cursor = drill.cursor.saturating_add(len);
}
fn apply_auto_indent_after_newline(drill: &mut DrillState) {
while drill.cursor < drill.target.len() {
let c = drill.target[drill.cursor];
if c == ' ' || c == '\t' {
drill.input.push(CharStatus::Correct);
drill.cursor += 1;
} else {
break;
}
}
}
pub fn process_backspace(drill: &mut DrillState) { pub fn process_backspace(drill: &mut DrillState) {
if drill.cursor == 0 { if drill.cursor == 0 {
return; return;

View File

@@ -82,21 +82,17 @@ impl AppLayout {
} }
pub fn centered_rect(percent_x: u16, percent_y: u16, area: Rect) -> Rect { pub fn centered_rect(percent_x: u16, percent_y: u16, area: Rect) -> Rect {
let vertical = Layout::default() const MIN_POPUP_WIDTH: u16 = 72;
.direction(Direction::Vertical) const MIN_POPUP_HEIGHT: u16 = 18;
.constraints([
Constraint::Percentage((100 - percent_y) / 2),
Constraint::Percentage(percent_y),
Constraint::Percentage((100 - percent_y) / 2),
])
.split(area);
Layout::default() let requested_w = area.width.saturating_mul(percent_x.min(100)) / 100;
.direction(Direction::Horizontal) let requested_h = area.height.saturating_mul(percent_y.min(100)) / 100;
.constraints([
Constraint::Percentage((100 - percent_x) / 2), let target_w = requested_w.max(MIN_POPUP_WIDTH).min(area.width);
Constraint::Percentage(percent_x), let target_h = requested_h.max(MIN_POPUP_HEIGHT).min(area.height);
Constraint::Percentage((100 - percent_x) / 2),
]) let left = area.x.saturating_add((area.width.saturating_sub(target_w)) / 2);
.split(vertical[1])[1] let top = area.y.saturating_add((area.height.saturating_sub(target_h)) / 2);
Rect::new(left, top, target_w, target_h)
} }