Architecture & Optimizations

Understanding how Pebble achieves exceptional performance

Virtual Machine Overview

Pebble uses a hand-crafted bytecode-based virtual machine built from the ground up for performance. Similar to languages like Python and Lua, the execution pipeline consists of three main stages:

1. Lexical Analysis & Parsing

Source code is tokenized and parsed into an Abstract Syntax Tree (AST). The parser performs semantic analysis and type checking during this phase.

2. Bytecode Compilation

The AST is compiled into compact bytecode instructions. The compiler performs constant folding and other optimizations at compile-time.

3. Virtual Machine Execution

The bytecode is executed by a stack-based virtual machine with several performance optimizations.

Bytecode Compilation

Pebble compiles source code to bytecode before execution, providing several advantages:

  • Single-pass compilation: Fast compilation with minimal overhead
  • Compact representation: Bytecode is smaller than source code and faster to execute
  • Compile-time optimizations: Constant folding, dead code elimination
  • Fast local loads: Special opcodes for loading locals 0-7 without operands

Optimization: String Literal Folding

When concatenating string literals at compile-time, the compiler folds them into a single constant:

let msg = "Hello, " + "World!"  # Compiled as single string "Hello, World!"

NaN Tagging

One of Pebble's most significant optimizations is NaN tagging - a technique that allows storing different value types in a single 64-bit word without heap allocation.

How It Works

IEEE 754 double-precision floating point numbers use 64 bits. Not all bit patterns represent valid numbers - some represent "NaN" (Not a Number). Pebble uses these NaN bit patterns to encode different types:

  • Doubles: Stored directly (non-NaN bit patterns)
  • Integers: Stored in NaN payload (up to 48 bits)
  • Booleans: Encoded in NaN with type tag
  • Nil: Special NaN pattern
  • Pointers: Object references stored in NaN payload

Performance Benefits

  • Zero allocation for primitives: Numbers, bools, nil don't require heap memory
  • Fast type checking: Type can be determined with a few bit operations
  • Compact representation: All values fit in 64 bits (8 bytes)
  • Cache friendly: Values fit in CPU registers and cache lines

Real-world impact: NaN tagging eliminates millions of allocations in typical programs, reducing garbage collection pressure and improving cache locality.

String Optimization

Pebble employs several techniques to make string operations exceptionally fast:

1. Lazy Hash Computation

String hashes are only computed when a string is used as a map key, not on creation:

  • String concatenation and creation: No hashing overhead
  • Map insertion/lookup: Hash computed once and cached
  • Result: 938x faster string concatenation vs. eager hashing

2. Reusable String Builder

String concatenation uses a reusable buffer to avoid repeated allocations:

  • Single builder allocated per VM instance
  • Buffer grows as needed and maintains capacity
  • Reset between operations (no reallocation)

3. String Interning for Literals

String literals are interned at compile-time:

  • Each unique string literal stored once
  • Equality comparison: pointer comparison (O(1))
  • Reduced memory footprint
Benchmark Result: String concatenation is 114x faster than Python and 40x faster than Lua thanks to these optimizations.

O(1) Symbol Lookup

Pebble uses compile-time symbol resolution for blazing-fast variable access:

Local Variables

  • Resolved to stack indices at compile-time
  • Runtime access: single array lookup (O(1))
  • No hash table lookups or string comparisons
  • Special opcodes for accessing locals 0-7 (most common case)

Global Variables

  • Stored in a flat array with indices known at compile-time
  • Access via direct array index (O(1))
  • No runtime symbol table searches

Comparison: Many interpreted languages use hash tables for variable lookup (O(1) average, but with overhead). Pebble's compile-time resolution eliminates this overhead entirely.

Memory Management

Pebble uses a tri-color mark-and-sweep garbage collector - a simple yet effective approach:

Heap-Allocated Objects

  • Strings: Immutable, reference-counted implicitly through GC
  • Arrays: Dynamic arrays with automatic growth
  • Maps: Hash tables with linear probing
  • Records: Fixed-size structures with typed fields
  • Functions: Function objects with bytecode and metadata

Garbage Collection Strategy

  • Mark phase: Trace reachable objects from stack and globals
  • Sweep phase: Free unmarked objects
  • Triggered: When memory pressure increases
  • Conservative: Keeps objects longer to reduce GC frequency

Array Growth Strategy

Dynamic arrays double in capacity when full, minimizing reallocations:

  • Amortized O(1) append operations
  • Reduced memory allocation overhead
  • Better cache locality with contiguous storage

Performance Results

These optimizations combine to deliver exceptional real-world performance:

String Concatenation (50K × 5 strings)

  • Pebble: 25-39ms
  • Python: 2858ms (114x slower)
  • Lua: 1020ms (40x slower)

Array Operations (100K push/pop + 1K insert/remove)

  • Pebble: 38-45ms
  • Python: 39-46ms (≈ equal)
  • Lua: 874ms (22x slower)

Binary Trees (depth 20, 2M+ allocations)

  • Pebble: 583ms
  • Python: 2633ms (4.5x slower)
  • Lua: 667ms (1.1x slower)

Map Operations (100K set/get/delete)

  • Pebble: 133ms
  • Python: 137ms (≈ equal)
  • Lua: 197ms (1.5x slower)

All benchmarks run on Linux x86_64 with compiler optimizations enabled.

Summary

Pebble's performance comes from a combination of well-established techniques applied carefully:

  • ✓ NaN tagging eliminates primitive allocations
  • ✓ Lazy hash computation avoids unnecessary work
  • ✓ Compile-time symbol resolution for O(1) variable access
  • ✓ Reusable buffers minimize allocations
  • ✓ Bytecode compilation enables optimizations
  • ✓ Simple GC with low overhead

These optimizations make Pebble suitable for both educational use and real-world applications where performance matters.