Architecture & Optimizations
Understanding how Pebble achieves exceptional performance
Virtual Machine Overview
Pebble uses a hand-crafted bytecode-based virtual machine built from the ground up for performance. Similar to languages like Python and Lua, the execution pipeline consists of three main stages:
1. Lexical Analysis & Parsing
Source code is tokenized and parsed into an Abstract Syntax Tree (AST). The parser performs semantic analysis and type checking during this phase.
2. Bytecode Compilation
The AST is compiled into compact bytecode instructions. The compiler performs constant folding and other optimizations at compile-time.
3. Virtual Machine Execution
The bytecode is executed by a stack-based virtual machine with several performance optimizations.
Bytecode Compilation
Pebble compiles source code to bytecode before execution, providing several advantages:
- Single-pass compilation: Fast compilation with minimal overhead
- Compact representation: Bytecode is smaller than source code and faster to execute
- Compile-time optimizations: Constant folding, dead code elimination
- Fast local loads: Special opcodes for loading locals 0-7 without operands
Optimization: String Literal Folding
When concatenating string literals at compile-time, the compiler folds them into a single constant:
let msg = "Hello, " + "World!" # Compiled as single string "Hello, World!"
NaN Tagging
One of Pebble's most significant optimizations is NaN tagging - a technique that allows storing different value types in a single 64-bit word without heap allocation.
How It Works
IEEE 754 double-precision floating point numbers use 64 bits. Not all bit patterns represent valid numbers - some represent "NaN" (Not a Number). Pebble uses these NaN bit patterns to encode different types:
- Doubles: Stored directly (non-NaN bit patterns)
- Integers: Stored in NaN payload (up to 48 bits)
- Booleans: Encoded in NaN with type tag
- Nil: Special NaN pattern
- Pointers: Object references stored in NaN payload
Performance Benefits
- Zero allocation for primitives: Numbers, bools, nil don't require heap memory
- Fast type checking: Type can be determined with a few bit operations
- Compact representation: All values fit in 64 bits (8 bytes)
- Cache friendly: Values fit in CPU registers and cache lines
Real-world impact: NaN tagging eliminates millions of allocations in typical programs, reducing garbage collection pressure and improving cache locality.
String Optimization
Pebble employs several techniques to make string operations exceptionally fast:
1. Lazy Hash Computation
String hashes are only computed when a string is used as a map key, not on creation:
- String concatenation and creation: No hashing overhead
- Map insertion/lookup: Hash computed once and cached
- Result: 938x faster string concatenation vs. eager hashing
2. Reusable String Builder
String concatenation uses a reusable buffer to avoid repeated allocations:
- Single builder allocated per VM instance
- Buffer grows as needed and maintains capacity
- Reset between operations (no reallocation)
3. String Interning for Literals
String literals are interned at compile-time:
- Each unique string literal stored once
- Equality comparison: pointer comparison (O(1))
- Reduced memory footprint
O(1) Symbol Lookup
Pebble uses compile-time symbol resolution for blazing-fast variable access:
Local Variables
- Resolved to stack indices at compile-time
- Runtime access: single array lookup (O(1))
- No hash table lookups or string comparisons
- Special opcodes for accessing locals 0-7 (most common case)
Global Variables
- Stored in a flat array with indices known at compile-time
- Access via direct array index (O(1))
- No runtime symbol table searches
Comparison: Many interpreted languages use hash tables for variable lookup (O(1) average, but with overhead). Pebble's compile-time resolution eliminates this overhead entirely.
Memory Management
Pebble uses a tri-color mark-and-sweep garbage collector - a simple yet effective approach:
Heap-Allocated Objects
- Strings: Immutable, reference-counted implicitly through GC
- Arrays: Dynamic arrays with automatic growth
- Maps: Hash tables with linear probing
- Records: Fixed-size structures with typed fields
- Functions: Function objects with bytecode and metadata
Garbage Collection Strategy
- Mark phase: Trace reachable objects from stack and globals
- Sweep phase: Free unmarked objects
- Triggered: When memory pressure increases
- Conservative: Keeps objects longer to reduce GC frequency
Array Growth Strategy
Dynamic arrays double in capacity when full, minimizing reallocations:
- Amortized O(1) append operations
- Reduced memory allocation overhead
- Better cache locality with contiguous storage
Performance Results
These optimizations combine to deliver exceptional real-world performance:
String Concatenation (50K × 5 strings)
- Pebble: 25-39ms
- Python: 2858ms (114x slower)
- Lua: 1020ms (40x slower)
Array Operations (100K push/pop + 1K insert/remove)
- Pebble: 38-45ms
- Python: 39-46ms (≈ equal)
- Lua: 874ms (22x slower)
Binary Trees (depth 20, 2M+ allocations)
- Pebble: 583ms
- Python: 2633ms (4.5x slower)
- Lua: 667ms (1.1x slower)
Map Operations (100K set/get/delete)
- Pebble: 133ms
- Python: 137ms (≈ equal)
- Lua: 197ms (1.5x slower)
All benchmarks run on Linux x86_64 with compiler optimizations enabled.
Summary
Pebble's performance comes from a combination of well-established techniques applied carefully:
- ✓ NaN tagging eliminates primitive allocations
- ✓ Lazy hash computation avoids unnecessary work
- ✓ Compile-time symbol resolution for O(1) variable access
- ✓ Reusable buffers minimize allocations
- ✓ Bytecode compilation enables optimizations
- ✓ Simple GC with low overhead
These optimizations make Pebble suitable for both educational use and real-world applications where performance matters.