Compute System

Three execution modes: Firecracker microVMs for lightweight workloads, QEMU full VMs for GPU and custom kernels, and serverless functions for event-driven compute.

Executor Architecture

The aleph-executor crate manages the full VM lifecycle. Key modules:

ModulePurpose
executorHigh-level job execution orchestration
lifecycleVM state machine (create, start, stop, destroy)
hypervisorHypervisor trait abstraction
firecrackerFirecracker microVM driver
qemuQEMU/KVM full VM driver
volumesPersistent volume management
cloud_initCloud-init configuration generation
meteringResource usage tracking and reporting
functionsServerless function runtime
migrationLive VM migration between nodes
model_servingAI/ML model serving infrastructure

Firecracker MicroVMs

Primary execution mode for lightweight workloads. Provides strong isolation with minimal overhead.

Sub-200ms Cold Start

MicroVMs boot in under 200ms, enabling true serverless-style compute.

Minimal Footprint

~5MB memory overhead per VM. Run hundreds of VMs on a single host.

Strong Isolation

Hardware-level isolation via KVM. Each VM gets its own kernel and network stack.

Configuration

// Firecracker VM configuration
struct FirecrackerConfig {
    kernel_image_path: PathBuf,
    rootfs_path: PathBuf,
    vcpu_count: u16,
    mem_size_mib: u32,
    network_interfaces: Vec<NetworkInterface>,
    drives: Vec<Drive>,
    boot_args: String,
}

QEMU Full VMs

For workloads requiring GPU passthrough, custom kernels, or full OS features.

GPU Passthrough

# QEMU with NVIDIA GPU passthrough (IOMMU/VFIO)
[executor.qemu]
enable_gpu = true
gpu_devices = ["0000:01:00.0"]  # PCI address
iommu_group = 1
vfio_driver = "vfio-pci"
GPU Scheduling

The scheduler matches GPU job requirements (gpuType, gpuVramMiB) against registered node capabilities. GPU types are tracked per-node in the NodeRegistry via _nodeGpuTypes.

Serverless Functions

Event-driven compute using pre-warmed Firecracker pools. The functions module maintains a pool of idle microVMs to achieve near-instant invocation.

// Function invocation flow
// 1. Request arrives at API server
// 2. Scheduler finds available function slot
// 3. If warm pool available: <5ms invocation
// 4. If cold start needed: ~150ms boot + invoke
// 5. Response returned, VM returned to pool

pub struct FunctionRuntime {
    pool: VmPool,
    max_concurrent: usize,
    idle_timeout: Duration,
    max_execution_time: Duration,
}

Cloud-Init

VMs are configured via cloud-init. The cloud_init module generates user-data and meta-data from job specifications.

#cloud-config (auto-generated)
hostname: job-0x1234abcd
users:
  - name: aleph
    ssh_authorized_keys:
      - ssh-ed25519 AAAA... user@host
write_files:
  - path: /etc/aleph/job.env
    content: |
      JOB_ID=0x1234abcd
      NODE_ID=0x5678efgh
runcmd:
  - systemctl start application

VM Networking

Each VM gets a TAP interface managed by aleph-vm-networking. Traffic is routed through nftables rules for isolation and port forwarding.

// Network setup per VM
// 1. Create TAP interface (tap-{vm_id})
// 2. Assign IP from subnet pool (10.0.x.x/30)
// 3. Configure nftables for NAT + port forwarding
// 4. Attach TAP to Firecracker/QEMU

pub struct VmNetwork {
    tap_name: String,
    vm_ip: Ipv4Addr,
    host_ip: Ipv4Addr,
    forwarded_ports: Vec<PortForward>,
}

Resource Metering

The metering module tracks CPU, memory, network, and disk usage per VM using cgroups v2. Metrics are reported in heartbeats and used for payment calculation.