🚀 High-Performance C++20 Distributed Cache
🎯 The Why
I built this cache to solve production bottlenecks found in traditional solutions like Redis and Memcached:
- Beyond Single-Threaded: Moves past the Redis single-threaded event loop and Memcached’s per-slab locking.
- Fine-Grained Concurrency: Implements reader-writer locking for high-throughput parallel access.
- Intelligence: Distinguishes between "one-hit wonders" and frequently accessed keys.
- Stampede Protection: Built-in Request Coalescing to collapse duplicate backend queries when a hot key expires.
🛠 The C++ Decisions
C++20 was chosen for deterministic memory layout and zero-cost abstractions.
Key Language Features Utilized:
| Feature | Purpose |
|---|---|
| std::shared_mutex | Striped reader-writer locking. |
| std::shared_future | Coalescing concurrent requests to prevent "thundering herds." |
| std::from_chars | Zero-allocation number parsing for the RESP protocol. |
| Move Semantics | Ensures the SET path is allocation-free after initial construction. |
| RAII | Exception-safe mutex management via std::unique_lock. |
Why not other languages? > * Go: Unpredictable P99 latency due to the goroutine scheduler.
🛰 The Protocol Layer
A breakdown of the lifecycle of a SET command:
- Ingress: Client sends RESP frame: 3\\r\\n...3\\r\\nfoo\\r\\n$3\\r\\nbar\\r\\n.
- Detection: The accept loop (using select() with 200ms timeout) spawns a HandleRespClient thread.
- Parsing: ParseRESP validates array counts (<1024) and string lengths (<1MB).
- Execution: ExecuteCommand hashes the key to a stripe index, acquires a unique_lock, and inserts the ValueEntry.
- Cleanup: EnforceCapacity() triggers a 3-phase eviction if limits are exceeded.
- Egress: Returns +OK\\r\\n via SendAll to handle partial writes.
🧠 The Memory Model & Eviction
The store is divided into 64 independent stripes to minimize lock contention.
ValueEntry Structure:
- std::string value (Standard heap).
- std::optional<steady_clock::time_point> TTL.
- uint32_t LFU counter (starts at 1, saturates at max).
- last_touch_ns timestamp.
- in_protected_segment flag (Hot/Cold separation).
The Three-Phase Eviction Process:
- Phase 1 (TTL): Scan stripes and erase expired entries.
- Phase 2 (Demotion): If the "Protected Segment" (hot half) is full, demote coldest entries to "Probationary."
- Phase 3 (Eviction): Rank probationary candidates by (segment, lfu_count, last_touch_ns) and evict the lowest score.
📈 Honest Benchmarks
- Throughput: 3.66M ops/sec (In-process, single thread).
- Latency: < 0.001 ms (p99).
- Deduplication: 87.5% ratio (7 out of 8 redundant requests collapsed during bursts).
- Binary Size: 122 KB (dynamically linked).
- Build Time: ~2 seconds.
🔮 What's Next?
- Modern I/O: Moving from thread-per-connection to io_uring or epoll for 10k+ concurrent connections.
- Persistence: Transitioning the WAL from in-memory to a file-backed append-only log with fdatasync().
- Enhanced RESP: Mapping the SET command's optional arguments to the existing internal TTL infrastructure.
x-TheFox/Distributed_Cache
A high-performance C++ distributed in-memory cache focused on throughput, adaptive in-memory eviction, and operational visibility. The core targets Redis-class workloads while showcasing sharding, replication, and control-plane coordination suitable for portfolio demonstration.