- Home
- Documentation
- configuration
- Filesystem Scan Cache Architecture
Filesystem Scan Cache Architecture
Filesystem Scan Cache Architecture Contract
Section titled “Filesystem Scan Cache Architecture Contract”This document defines the current contract for the shared filesystem scan cache implemented in Rust (crates/pi-natives/src/fs_cache.rs) and consumed by native discovery/search APIs exposed to packages/coding-agent.
What this cache is
Section titled “What this cache is”The cache stores full directory-scan entry lists (GlobMatch[]) keyed by scan scope and traversal policy, then lets higher-level operations (glob filtering, fuzzy scoring, grep file selection) run against those cached entries.
Primary goals:
- avoid repeated filesystem walks for repeated discovery/search calls
- keep consistency across
glob,fuzzyFind, andgrepwhen they share the same scan policy - allow explicit staleness recovery for empty results and explicit invalidation after file mutations
Ownership and public surface
Section titled “Ownership and public surface”- Cache implementation and policy:
crates/pi-natives/src/fs_cache.rs - Native consumers:
crates/pi-natives/src/glob.rscrates/pi-natives/src/fd.rs(fuzzyFind)crates/pi-natives/src/grep.rs
- JS binding/export:
packages/natives/src/glob/index.ts(invalidateFsScanCache)packages/natives/src/glob/types.tspackages/natives/src/grep/types.ts
- Coding-agent mutation invalidation helpers:
packages/coding-agent/src/tools/fs-cache-invalidation.ts
Cache key partitioning (hard contract)
Section titled “Cache key partitioning (hard contract)”Each entry is keyed by:
- canonicalized
rootdirectory path include_hiddenbooleanuse_gitignoreboolean
Implications:
- Hidden and non-hidden scans do not share entries.
- Gitignore-respecting and ignore-disabled scans do not share entries.
- Consumers must pass stable semantics for hidden/gitignore behavior; changing either flag creates a different cache partition.
node_modules inclusion is not in the cache key. The cache stores entries with node_modules included; per-consumer filtering is applied after retrieval.
Scan collection behavior
Section titled “Scan collection behavior”Cache population uses a deterministic walker (ignore::WalkBuilder) configured by include_hidden and use_gitignore:
follow_links(false)- sorted by file path
.gitis always skippednode_modulesis always collected at cache-scan time (and optionally filtered later)- entry file type +
mtimeare captured viasymlink_metadata
Search roots are resolved by resolve_search_path:
- relative paths are resolved against current cwd
- target must be an existing directory
- root is canonicalized when possible
Freshness and eviction policy
Section titled “Freshness and eviction policy”Global policy (environment-overridable):
FS_SCAN_CACHE_TTL_MS(default1000)FS_SCAN_EMPTY_RECHECK_MS(default200)FS_SCAN_CACHE_MAX_ENTRIES(default16)
Behavior:
get_or_scan(...)- if TTL is
0: bypass cache entirely, always fresh scan (cache_age_ms = 0) - on cache hit within TTL: return cached entries + non-zero
cache_age_ms - on expired hit: evict key, rescan, store fresh entry
- if TTL is
- max entry enforcement is oldest-first eviction by
created_at
Empty-result fast recheck (separate from normal hits)
Section titled “Empty-result fast recheck (separate from normal hits)”Normal cache hit:
- a cache hit inside TTL returns cached entries and does nothing else.
Empty-result fast recheck:
- this is a caller-side policy using
ScanResult.cache_age_ms - if filtered/query result is empty and cached scan age is at least
empty_recheck_ms(), caller performs oneforce_rescan(...)and retries - intended to reduce stale-negative results when files were recently added but cache is still within TTL
Current consumers:
glob: rechecks when filtered matches are empty and scan age exceeds thresholdfuzzyFind(fd.rs): rechecks only when query is non-empty and scored matches are emptygrep: rechecks when selected candidate file list is empty
Consumer defaults and cache usage
Section titled “Consumer defaults and cache usage”Cache is opt-in on all exposed APIs (cache?: boolean, default false).
Current defaults in native APIs:
glob:hidden=false,gitignore=true,cache=falsefuzzyFind:hidden=false,gitignore=true,cache=falsegrep:hidden=true,cache=false, and cache scan always usesuse_gitignore=true
Coding-agent callers today:
- High-volume mention candidate discovery enables cache:
packages/coding-agent/src/utils/file-mentions.ts- profile:
hidden=true,gitignore=true,includeNodeModules=true,cache=true
- Tool-level
grepintegration currently disables scan cache (cache: false):packages/coding-agent/src/tools/grep.ts
Invalidation contract
Section titled “Invalidation contract”Native invalidation entrypoint:
invalidateFsScanCache(path?: string)- with
path: remove cache entries whose root is a prefix of target path - without path: clear all scan cache entries
- with
Path handling details:
- relative invalidation paths are resolved against cwd
- invalidation attempts canonicalization
- if target does not exist (e.g., delete), fallback canonicalizes parent and reattaches filename when possible
- this preserves invalidation behavior for create/delete/rename where one side may not exist
Coding-agent mutation flow responsibilities
Section titled “Coding-agent mutation flow responsibilities”Coding-agent code must invalidate after successful filesystem mutations.
Central helpers:
invalidateFsScanAfterWrite(path)invalidateFsScanAfterDelete(path)invalidateFsScanAfterRename(oldPath, newPath)(invalidates both sides when paths differ)
Current mutation tool callsites:
packages/coding-agent/src/tools/write.tspackages/coding-agent/src/patch/index.ts(hashline/patch/replace flows)
Rule: if a flow mutates filesystem content or location and bypasses these helpers, cache staleness bugs are expected.
Adding a new cache consumer safely
Section titled “Adding a new cache consumer safely”When introducing cache use in a new scanner/search path:
-
Use stable scan policy inputs
- decide hidden/gitignore semantics first
- pass them consistently to
get_or_scan/force_rescanso cache partitions are intentional
-
Treat cache data as pre-filtered only by traversal policy
- apply tool-specific filtering (glob patterns, type filters, node_modules rules) after retrieval
- never assume cached entries already reflect your higher-level filters
-
Implement empty-result fast recheck only for stale-negative risk
- use
scan.cache_age_ms >= empty_recheck_ms() - retry once with
force_rescan(..., store=true, ...) - keep this path separate from normal cache-hit logic
- use
-
Respect no-cache mode explicitly
- when caller disables cache, call
force_rescan(..., store=false, ...) - do not populate shared cache in a no-cache request path
- when caller disables cache, call
-
Wire mutation invalidation for any new write path
- after successful write/edit/delete/rename, call the coding-agent invalidation helper
- for rename/move, invalidate both old and new paths
-
Do not add per-call TTL knobs
- current contract is global policy only (env-configured), no per-request TTL override
Known boundaries
Section titled “Known boundaries”- Cache scope is process-local in-memory (
DashMap), not persisted across process restarts. - Cache stores scan entries, not final tool results.
glob/fuzzyFind/grepshare scan entries only when key dimensions (root,hidden,gitignore) match..gitis always excluded at scan collection time regardless of caller options.